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The  Influence  of  Dynamic  Shadows  on  Presence  in  Immersive 

Virtual  Environments 

Mel  Slater,  Martin  Usoh,  Yiorgos  Chrysanthou, 
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London  Parallel  Applications  Centre, 

QMW  University  of  London, 
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1.  Introduction 

We  describe  an  experiment  to  examine  the  effect  of  shadows  on  two  different  aspects  of  the 
experience  of  immersion  in  a  virtual  environment  (VE):  depth  perception  and  presence.  It  is 
well-known  that  shadows  can  significantly  enhance  depth  perception  in  everyday  reality 
[1,5,7].  Shadows  provide  alternative  views  of  objects,  and  provide  direct  information  about 
their  spatial  relationships  with  surrounding  surfaces.  VR  systems  typically  do  not  support 
shadows,  and  yet  potential  applications,  especially  in  the  training  sphere,  will  require 
participants  to  make  judgements  about  such  relationships.  Even  the  simple  task  of  moving  to  an 
object  and  picking  it  up  can  be  problematic  when  observers  cannot  easily  determine  their 
distance  from  the  object,  or  its  distance  from  surrounding  objects.  We  introduce  dynamic 
shadows  to  examine  whether  such  task  performance  can  be  enhanced. 

We  have  argued  elsewhere  [8]  that  presence  is  the  key  to  the  science  of  immersive  virtual 
enviromnents  (virtual  reality).  We  distinguish,  however,  between  immersion  and  presence. 
Immersion  includes  the  extent  to  which  the  computer  displays  are  extensive,  surrounding, 
inclusive,  vivid  and  matching.  The  displays  are  more  extensive  the  more  sensory  systems  that 
they  accommodate.  They  are  surrounding  to  the  extent  that  information  can  arrive  at  the 
person's  sense  organs  from  any  (virtual)  direction.  They  are  inclusive  to  the  extent  that  all 
external  sensory  data  (from  physical  reality)  is  shut  out.  Their  vividness  is  a  function  of  the 
variety  and  richness  of  the  sensory  information  they  can  generate  [1 1].  In  the  context  of  visual 
displays,  for  example,  colour  displays  are  more  vivid  than  monochrome,  and  displays 
depicting  shadows  are  more  vivid  than  those  that  do  not.  Vividness  is  concerned  with  the 
richness,  information  content,  resolution  and  quality  of  the  displays.  Finally,  immersion 
requires  that  there  is  match  between  the  participant's  proprioceptive  feedback  about  body 
movements,  and  the  information  generated  on  the  displays.  A  turn  of  the  head  should  result  in 
a  corresponding  change  to  the  visual  display,  and,  for  example,  to  the  auditory  displays  so  that 
sound  direction  is  invariant  to  the  orientation  of  the  head.  Matching  requires  body  tracking,  at 
least  head  tracking,  but  generally  the  greater  the  degree  of  body  mapping,  the  greater  the  extent 
to  which  the  movements  of  the  body  can  be  accurately  reproduced. 

Immersion  also  requires  a  self-representation  in  the  VE  -  a  Virtual  Body  (VB).  The  VB  is  both 
part  of  the  perceived  environment,  and  represents  the  being  that  is  doing  the  perceiving. 
Perception  in  the  VE  is  centred  on  the  position  in  virtual  space  of  the  VB  -  e.g.,  visual 
perception  from  the  viewpoint  of  the  eyes  in  the  head  of  the  VB. 

Immersion  is  an  objective  description  of  what  any  particular  system  does  provide.  Presence  is  a 
state  of  consciousness,  the  (psychological)  sense  of  being  in  the  virtual  environment. 
Participants  who  are  higUy  present  should  experience  the  VE  as  more  the  engaging  reality  than 
the  surrounding  world,  and  consider  the  environment  specified  by  the  displays  as  places  visited 
rather  than  as  images  seen.  Behaviours  in  the  VE  should  be  consistent  with  behaviours  that 
would  have  occurred  in  everyday  reality  in  similar  circumstances. 

Presence  requires  that  the  participant  identify  with  the  VB  -  that  its  movements  are  his/her 
movements,  and  that  the  VB  comes  to  "be"  the  body  of  that  person  in  the  VE.  We  speculate 
that  the  additional  information  provided  by  shadows  about  the  movements  of  the  VB  in 


si^faces  of  the  VE  can  enhance  this  degree  of  association,  and  hence  the 
degree  of  presence.  However,  we  were  unable  to  test  this  in  the  current  experiment  We  do 
however,  consider  the  proposition  that  shadows,  increasing  the  degree  of  vividness  of  dS 
visual  displays,  will  enhance  the  sense  of  presence. 

2.  Experiment 
2.1  Scenario 

Be  expenmental  scenario  consisted  of  a  virtual  room,  the  elevation  of  which  is  shown  in 

nSnn  r^Th  ^  wall,  but  behind  a  small  screen.  Another  green  spe^  is  at 

position  G.  The  subject  begins  the  expenment  by  moving  to  the  red  square  (X),  and  fLing  the 

^  ii^s^cbon  IS  to  choose  the  spear  nearest  the  wall,  observing  from  position  X. 

whiprf  hISfcT  f  subject  moves  towards  it,  picks  it  up  and  returns  to  X.  There  the 

subject  turns  to  the  left,  facing  a  target  on  the  far  wall.  The  subject  must  orient  the  spear  to 
point  appro^ately  towards  the  target,  fire  and  guide  it  towards  the  target  by  hand 
movements.  The  instructions  were  that  the  spear  must  be  shot  at  the  target,  and  t&t  it  must  be 

nn^rini  ^“lally,  the  subject  must  bring  the  green  spear  to 

position  X.  This  was  repeated  six  times  for  each  subject.  ^ 

experiment  each  subject  was  given  a  sheet  explaining  these  procedures, 
the  expenmenter  talking  the  subject  through  the  entire 
scenario.  Runs  1  through  5  were  carried  out  by  the  subject  without  intervention  by  the 
expenmenter.  Between  each  run  the  subject  was  advised  to  relax  with  closed  eyes,  either  with 

display  (HMD,  see  below),  although  all  but  one  continued  to 
wear  It  dunng  the  two  minutes  that  it  took  to  load  the  program  for  the  subsequent  run.  Each  of 
the  five  runs  were  the  same  apart  from  the  distances  of  the  red  spears  from  the  wall.  Also 
some  runs  displayed  dynamic  shadows  of  the  spears  and  the  small  screen,  while  others  did 

Eight  subjects  were  selected  by  the  experimenters  asking  people  throughout  the  QMW  campus 
(in  canteens,  bars,  laboratones,  offices)  whether  they  wished  to  take  part  in  a  study  of  "virtoal 
reality  .  People  from  our  own  Department  were  not  included. 

The  design  is  shown  in  Table  1,  which  indicates  the  positions  of  the  point-light  source  for 
those  runs  that  included  shadows.  Note  that  of  the  40  runs,  20  included  shadows^ 

Table  1 

Runs  of  the  Experiment  for  Each  Subject 
1, 2,3,4  denotes  the  four  point-light  positions  of  Figure  1 
0  denotes  no  shadows 


2.2  Spatial  Variables  and  Hypotheses 


The  variables  measured  in  order  to  assess  the  effects  of  shadows  on  spatial  judgement  were  as 
follows: 

Spear  Selected 

S:  the  spear  selected  from  observation  position  X.  The  spears  ranged  from  50  cm  to  90  cm 
from  the  wall,  positioned  with  10  cm  variations.  The  small  screen  in  front  of  the  spears 
obscured  the  positions  where  they  touched  the  floor,  for  any  subject  standing  at  position  X. 
Also,  because  their  distances  from  the  wall  varied  only  slightly,  their  heights,  as  judged  from 
position  X  would  look  the  same.  It  was  therefore  very  difficult  to  judge  which  spear  was 
nearest  the  wall.  Variable  S  was  the  rank  order  of  the  spear  chosen,  where  1  would  be  the 
nearest  to  the  wall,  and  5  the  furthest. 

The  hypothesis  was  that  subjects  would  be  able  to  use  the  shadows  of  spears  on  the  walls  to 
aid  their  judgement  about  the  closeness  to  the  walls,  so  that  those  runs  that  included  shadows 
would  result  in  a  greater  number  of  correct  spears  being  chosen. 

Distances  from  Target 

C:  this  is  the  distance  of  the  point  of  the  spear  from  the  centre  of  the  target  at  the  position  that  it 
was  stopped  in  flight  by  the  subject. 

The  hypothesis  was  that  the  subjects  would  be  able  to  use  the  shadow  of  the  spear  in  flight, 
especially  its  shadow  on  the  target  wall,  to  help  guide  the  spear  towards  the  target.  Therefore, 
the  mean  distance  should  be  less  for  the  shadow  runs  than  for  the  non-shadow  runs. 

D:  this  is  the  distance  that  the  point  of  the  spear  was  behind  or  in  front  of  the  target  at  the 
position  that  it  was  stopped  by  the  subject. 

The  hypothesis  is  as  for  C,  except  that  here  we  would  expect  a  greater  shadow  effect  since  the 
action  required  to  stop  the  spear  in  flight  (releasing  a  button  on  the  hand-held  3D  mouse)  is 
simpler  than  that  involved  in  guiding  the  spear  to  the  bulls  eye.  Moreover,  at  the  moment  the 
spear  point  touched  the  target  wall,  it  would  also  meet  its  shadow. 


2.3  Presence  Variables  and  Hypotheses 

In  previous  studies  we  have  used  subjective  reported  levels  of  "presence"  based  on  a 
questionnaire.  In  this  method  subjective  presence  was  assessed  in  three  ways:  the  sense  of 
"being  there"  in  the  VE,  the  extent  to  which  there  were  times  that  the  virtue  world  seemed 
more  the  presenting  reality  than  the  real  world,  and  the  sense  of  visiting  somewhere  rather  than 
just  seeing  images.  In  the  present  study  these  three  basic  determinants  were  elaborated  into  six 
questions,  each  measured  on  a  7-point  scale,  where  lowest  presence  is  1,  and  highest  is  7  (see 
Appendix  A).  The  overall  presence  score  (P)  was  conservatively  taken  as  the  number  of  high 
(6  or  7)  ratings  amongst  the  six  questions,  so  that  0  <  P  <  6. 

Although  we  have  obtained  good  results  with  such  subjective  measures  before,  in  the  shadow 
experiment  we  introduced  in  addition  a  more  "objective"  measurement  of  presence.  This  was 
achieved  by  having  one  particular  object  (a  radio)  in  both  the  real  world  of  the  laboratory  in 
which  the  experiment  took  place  and  the  virtual  world  of  the  room  with  spears. 

Just  before  the  practice  run  the  subjects  were  shown  a  radio  on  the  floor  against  a  large  screen 
in  the  laboratory.  They  were  told  that  they  would  see  "the  radio"  in  the  virtual  world,  and  that 
occasionally  it  would  switch  itself  on.  Whenever  they  heard  the  sound  they  should  point 
towards  "the  radio",  and  press  a  button  on  the  hand-held  mouse.  This  would  act  as  an  "infra¬ 
red"  device  to  switch  the  radio  off.  Before  they  entered  into  the  VE  the  radio  was  momentarily 
switched  on,  deliberately  not  tuned  to  any  particular  channel  therefore  causing  it  to  play  an 
audible  but  meaningless  tone.  Each  time  that  the  subject  entered  into  the  VE,  i.e.,  at  the  start  of 


each  run  they  were  told:  Orient  yourself  by  looking  for  the  red  square  on  the  floor  and  the 
ramo  .  The  radio  was  placed  in  the  VE  at  the  same  position  relative  to  the  red  square  as  the  real 
radio  was  to  the  position  of  the  subject  just  before  entering  the  VE. 

At  four  moments  during  the  experiment,  always  while  the  subject  was  (virtually)  on  the  red 
squ^e,  the  real  radio  was  moved  to  one  of  four  different  positions.  These  were  Im  apart  from 
each  odier,  on  a  line  coincident  (in  the  real  world)  with  the  small  screen  by  which  the  radio  was 
located  (in  the  virtual  world).  The  ordering  was  selected  randomly  before  the  start  of  the 
experiment.  The  virtual  radio  was  always  in  the  same  place.  Therefore  the  subject  would  hear 

the  sound  coming  from  a  different  location  compared  to  the  visible  position  of  the  radio.  The 

idea  IS  that  (other  things  being  equal),  a  high  degree  of  presence  would  lead  to  the  subject 
pointmg  towards  the  virtual  radio  rather  than  the  real  one.  Hence  we  tried  to  cause  and  use  the 
conflict  between  virtual  and  real  information  as  an  assessment  of  presence.  Those  (two) 
subjects  who  did  ask  about  the  contradiction  were  told  "Just  point  at  where  you  think  the  radio 
IS  .  Throughout,  both  the  real  radio  and  the  virtual  radio  were  referred  to  as  "the  radio" 
dehberately  allowing  for  a  confusion  in  the  minds  of  the  subjects. 

It  is  important  to  note  that  we  mean  "presence"  in  a  strong  behavioural  sense  with  respect  to 
this  measurement.  The  questionnaire  attempts  to  elicit  the  subject’s  state  of  mind.  The  radio 
method  though  is  concerned  only  with  their  behaviour.  If  they  pointed  to  the  virtual  radio 
l^cause  of  a  need  to  obey  the  experimenter,  or  because  it  was  a  matter  of  "playing  the  game" 
then  so  be  it.  Provided  that  they  act  in  accordance  with  the  conditions  of  the  VE,  this  is 
behavioural  presence. 

Let  R  be  the  angle  between  the  subject's  real  pointing  direction  and  the  direction  to  the  real 
radio.  Let  V  be  the  angle  between  the  subject’s  virtual  pointing  direction  and  the  direction  to  the 
virtual  radio.  Small  V  therefore  occurs  when  the  subject  points  towards  the  virtual  radio.  We 
use  Pq  =  RA^  as  the  measurement  of  the  extent  to  which  the  subject  tends  towards  the  virtual 
radio  -  a  small  V  in  comparison  to  R  would  result  in  large  Pg.  Therefore  larger  values  of  P^ 
indicate  greater  tendency  towards  the  virtual. 

There  were  two  hypotheses  relating  to  P^:  First,  that  it  would  correlate  positively  with  P,  and 
second  that  the  greater  exposure  of  the  subject  to  shadows,  the  greater  the  value  of  Pa.  Of 
course,  we  would  also  expect  that  the  greater  the  exposure  to  shadows,  the  greater  the  value  of 


2.4  Representation  System  Dominance 

A  clear  objection  to  this  procedure  is  that  it  could  be  measuring  the  extent  of  visual  or  auditory 
doirunance  rather  than  presence.  Faced  with  conflicting  information  from  two  senses,  the 
action  is  likely  to  depend  on  which  sensory  system  is  "dominant".  In  previous  work 
[9,10]  we  have  explored  the  relationship  between  dominant  representation  systems  and  the 
extent  of  subjective  presence,  and  have  ^ways  found  a  very  strong  relationship.  This  is  based 
on  the  idea  that  people  differ  in  the  extent  to  which  they  require  visual,  auditory  or 
kmesthetic/tactile  information  in  order  to  construct  their  world  models,  and  that  each  person 
may  have  a  general  tendency  to  prefer  one  type  of  representation  (say  visual)  over  another  (say 
auditoiy).  We  found  that  in  experiments  where  the  virtual  reality  system  presented  almost 
exclusively  visual  information,  the  greater  the  degree  of  visual  dominance  the  higher  the  sense 
of  presence,  whereas  the  greater  degree  of  auditory  dominance,  the  lower  the  sense  of 
presence. 


In  this  shadow  experiment  therefore  we  employed  an  updated  version  of  the  questionnaire  we 
used  m  [10]  that  is  given  to  the  subjects  before  attending  the  experimental  session,  that  attempts 
to  elicit  their  preferences  regarding  visual,  auditory  and  kinesthetic  modes  of  thinking.  This 
questionnaire  presents  10  situations,  each  one  having  three  responses  (one  visual,  one 
auc?  .ory,  and  one  kinesthetic  response).  They  are  asked  to  rank  their  most  likely  response  as  1, 
nexs.  most  likely  as  2,  and  least  likely  as  3.  From  this  a  V  score  is  constructed  as  the  total 


number  of  V=1  scores  out  of  10,  and  similarly  for  A  and  K.  Alternatively  the  sums  of  the 
responses  may  be  used.  These  V  and  A  variables  can  therefore  be  used  to  statistically  factor  out 
the  possible  influence  of  visual  or  auditory  dominance  on  the  radio  angles. 

The  hypothesis  with  respect  to  V,  A  and  K  would  be  that  V  and  K  would  be  positively 
correlated  with  presence  (however  it  is  measured)  whereas  A  would  be  negatively  correlated,  in 
line  with  our  previous  findings.  Note  that  by  construction,  there  are  only  2  degrees  of  freedom 
amongst  V,  A  and  K. 

3.  Apparatus 
3.1  Equipment 

The  experiments  described  in  this  paper  were  implemented  on  a  DIVISION  Provision  system,  a 
parallel  architecture  for  implementing  virtual  environments  running  under  the  dVS  (vO.l) 
operating  environment.  The  Provision  system  is  based  on  a  distributed  memory  architecture  in 
which  a  number  of  autonomous  processing  modules  are  dedicated  to  a  part  of  the  virtual 
environment  simulation.  These  processing  modules  or  Transputer  Modules  (TRAMs)  are  small 
self-contained  parallel  processing  building  blocks  complete  with  their  own  local  memory  and 
contain  at  least  one  Inmos  Transputer  which  may  control  other  specialised  peripheral  hardware 
such  as  digital  to  analog  converters  (DAC).  Several  modules  exist.  These  include: 

•  the  module  to  act  as  the  module  manager. 

•  the  DAC  module  for  audio  output. 

•  polygon  modules  for  z-buffering  and  Gouraud  shading. 

•  application  specific  modules  for  the  user  applications. 

The  dVS  operating  environment  (Grimsdale,  1991)  is  based  on  distributed  Client/Server 
principles.  Each  TRAM  or  processing  cluster  is  controlled  by  an  independent  parallel  process 
known  as  an  Actor.  Each  provides  a  set  of  services  relating  to  the  elements  of  the  environment 
which  it  oversees.  Such  elements  presently  consist  of  lights,  objects,  cameras,  controls  (i.e. 
input  devices),  and  collisions  between  objects.  Thus,  an  Actor  provides  a  service  such  as  scene 
rendering  (visualisation  actor).  Another  Actor  may  be  responsible  for  determining  when  objects 
have  collided  (collision  actor)  and  yet  another  for  hand  tracking  and  input  device  scanning.  All 
these  Actors  are  co-ordinated  by  a  special  Actor  called  the  Director.  Communication  between 
the  different  Actors  can  only  be  made  via  the  Director.  The  Director  also  ensures  consistency  in 
the  environment  by  maintaining  elements  of  the  environment  which  are  shared  by  the  different 
Actors. 

The  Provision  system  includes  a  DIVISION  3D  mouse,  and  a  Virtual  Research  Flight  Helmet 
as  the  head  mounted  display  (HMD).  Polhemus  sensors  are  used  for  position  tracking  of  the 

head  and  the  mouse.  The  displays  are  colour  LCDs  with  a  360x240  resolution  and  the  HMD 
provides  a  horizontal  field  of  view  of  about  75  degrees. 

All  subjects  saw  a  VB  as  self  representation.  They  would  see  a  representation  of  their  right 
hand,  and  their  thumb  and  first  finger  activation  of  the  3D  pointer  buttons  would  be  reflected  in 
movements  of  their  corresponding  virtual  finger  and  thumb.  The  hand  was  attached  to  an  arm, 
that  could  be  bent  and  twisted  in  response  to  similar  movements  of  the  real  arm  and  wrist.  The 
arm  was  connected  to  an  entire  but  simple  block-like  body  representation,  complete  with  legs 
and  left  arm.  Forward  movement  was  accompanied  by  walking  motions  of  the  virtual  legs.  If 
the  subjects  turned  their  real  head  around  by  more  than  60  degrees,  then  the  virtual  body  would 
be  reoriented  accordingly.  So  for  example,  if  they  turned  their  real  body  around  and  then 
looked  down  at  their  virtual  feet,  their  orientation  would  line  up  with  their  real  body.  However, 
turning  only  the  head  around  by  more  than  60  degrees  and  looking  down  (an  infrequent 
occurrence),  would  result  in  the  real  body  being  out  of  alignment  with  the  virtual  body. 

The  3D  mouse  is  shaped  something  like  a  gun.  There  is  a  button  in  the  position  of  the  hammer, 
which  is  depressed  by  the  thumb.  This  causes  forward  motion  in  the  direction  of  pointing. 


There  is  a  button  on  each  side  of  this  central  thumb  button,  each  activated  by  the  thumb.  The 
left  one  was  used  to  fire  the  spears  -  while  this  button  was  depressed  the  spear  would  move  in 
a  direction  determined  by  hand  orientation.  The  spear  would  stop  on  release  of  this  button,  and 
could  not  be  activated  again,  thus  giving  the  subject  one  chance  per  spear.  The  right  thumb 
button  was  used  as  the  "infra-red"  radio  switch.  Corresponding  to  the  trigger  is  a  button  for  the 
forefinger.  This  is  used  to  pick  objects  -  squeezing  this  finger  button  while  the  virtual  hand 
intersects  an  object  results  in  the  object  attaching  to  the  hand.  Subjects  were  able  to  master 
these  controls  very  quickly. 


3.2  Shadow  Algorithm  and  Frame  Rates 

The  shadow  algorithm  is  described  in  detail  elsewhere  [3].  It  is  based  on  a  dynamic  Shadow 
Voluine  BSP  tree  [2],  constructed  from  polygons  in  arbitrary  order,  that  is  without  the 
necessity  of  a  separate  scene  BSP  tree.  Shadows  are  created  as  polygons  in  object  space. 
Creation  of  new  shadows  and  changes  to  shadows  are  communicated  dynamically  to  the 
renderer  via  the  Director. 

For  reasons  described  below,  the  entire  scene  was  small,  consisting  of  413  triangles,  of  which 
only  52  would  be  likely  to  influence  shadow  creation.  The  frame  rate  achieved  without 
shadows  was  9Hz.  The  frame  rate  with  shadows,  6  to  8Hz,  was  not  very  satisfactory,  but  due 
to  the  particular  version  of  the  dVS  software  architecture  in  use  on  this  machine  at  the  time  of 
the  experiment. 

Without  rendering  the  shadow  algorithm  runs  on  this  machine  at  a  frequency  of  between  19 
and  21  Hz  depending  on  the  complexity  of  the  view  at  any  moment.  The  renderer  does  not 
however  run  at  this  frequency  during  dynamic  changes  of  a  virtual  object,  due  to  update 
problems  associated  with  the  extant  implementation  of  the  dVS  dynamic  geometry  object. 
Therefore,  when  rendering  and  the  associated  communication  time  is  included,  the  frame  rate  is 
6  to  8Hz.  (A  new  version  of  dVS  is  intended  to  solve  this  problem). 

dVS  vO.l  maintains  the  concept  of  a  "dynamic  geometry  object".  This  is  a  vertex-face  structure 
representing  a  (possibly  empty)  set  of  polygons.  The  actual  polygons  belonging  to  this  object 
can  be  created  or  modified  at  run  time.  When  such  a  change  is  made  to  a  dynamic  object,  there 
is  an  "update"  generated  that  sends  the  object  to  the  Director  for  distribution  to  the  Visualisation 
Actor  and  then  onto  to  the  renderer. 

Upon  any  change  of  a  virtual  object  the  shadow  algorithm  recomputes  the  shadow  scene 
outputting  any  modified  shadow  polygons,  i.e.  any  polygons  that  have  been  deleted  and  any 
that  have  been  created.  This  information  is  transmitted  to  die  shadow  generation  module  which 
will  mark  deleted  polygons  as  invisible  to  be  re-used  later  by  new  shadow  polygons.  The 
module  uses  a  linked  list  structure  of  dynamic  objects  -  the  shadow  object.  Each  element  in  the 
list  is  a  dynamic  object  consisting  of  32  shadow  polygons.  This  linked  list  structure  is 
necessary  in  order  to  break  down  the  entire  list  of  potential  shadow  polygons  into  smaller 
chunks,  rather  than  have  one  dynamic  geometry  object  for  all  possible  shadows,  since  the 
dynamic  geometry  implementation  can  only  send  updates  of  an  entire  dynamic  object  to  the 
Visualisation  Actor.  Note  that  a  change  in  one  single  shadow  polygon  will  result  in  the 
communication  of  a  complete  32-polygon  dynamic  object.  If,  unfortunately,  33  shadow 
polygons  change,  then  two  dynamic  objects  consisting  of  64  polygons  are  communicated,  and 
so  on. 

Note  that  there  is  one  important  implication  of  this  for  the  spatial  judgement  component  of  the 
experiment  -  obviously  the  spear  travels  more  slowly  when  there  are  shadows.  Without 
shadows  the  mean  velocity  is  92  cm/sec,  and  with  shadows  47  cm/sec.  Therefore  it  can  validly 
be  argued  that  differences  in  targeting  performance  result  from  the  velocity  rather  than  the  use 
of  shadows.  However,  the  effect  of  this  can  be  examined  statistically.  With  regard  to  the 
influence  on  presence  we  would  argue  that  the  slower  frame  rate  in  the  case  of  shadows  would 
tend  to  have  a  negative  effect  on  presence. 


4.  Results 

4.1  Spatial  Variables 

Spear  Selected 


Shadows  made  no  difference  at  all  to  the  selection  of  the  "correct"  spear  (the  one  closest  to  the 
wall). 

Distances  from  Target 

Consider  first  C  the  distance  of  the  point  of  the  spear  from  the  centre  of  the  target.  A  regression 
analysis  was  used  to  examine  the  effect  of  velocity,  showing  that  velocity  within  each  of  the 
shadow/  no-shadow  groups  was  did  not  have  a  statistically  significant  effect.  The  mean 
distance  without  shadows  is  152cm  and  115cm  with  shadows.  However,  the  difference 
between  these  two  is  not  statistically  significant. 

Consider  next  D,  the  perpendicular  distance  of  the  point  of  the  spear  from  the  wall  of  the  target. 
This  could  be  positive  (spear  stops  in  front  of  the  target)  or  negative,  the  spear  stops  behind). 
Carrying  out  a  within-group  regression  analysis  to  examine  the  effect  of  velocity  again  shows 
that  velocity  is  not  statistically  significant.  The  means  are  -39.9cm  without  shadows,  and 
3.3cm  with  shadows.  The  standard  errors  are  3.6  and  3.5  respectively  and  the  difference  is 
significant  at  5%.  The  medians  of  the  shadow  and  non-shadow  D  values  are  -3cm  and  -38cm 
respectively. 


Although  the  within-group  velocity  appeared  not  to  be  statistically  significant  in  each  case, 
there  is  still  some  doubt  about  whether  the  inference  about  better  performance  in  the  case  of 
shadows  is  safe.  The  variation  of  velocity  within  groups  was  not  very  great  (the  minimum  and 
maximum  velocities  were  81.6  to  99.0  for  the  non-shadow  group,  and  36.0  to  60.4  for  the 
shadow  group).  Subsequent  experiments  should  attempt  to  produce  a  greater  similarity  in 
performance  between  the  two  groups. 

4.2  Presence 
Subjective  Presence 

P  is  the  number  of  "high"  questionnaire  scores,  as  a  count  out  of  6.  We  therefore  treated  P  as  a 
binomially  distributed  dependent  variable,  and  used  logistic  regression. 

In  logistic  regression  [4],  the  dependent  variable  is  binomially  distributed,  with  expected  value 
related  by  the  logistic  function  to  a  linear  predictor.  Let  the  independent  and  explanatory 
vanables  be  denoted  by  xi,X2,...,Xk.  Then  the  linear  predictor  is  an  expression  of  the  form: 

k 

^i  =Po  +  =  1’2,...,N)  (1) 

j=l 

where  N  (=8)  is  the  number  of  observations.  The  logistic  regression  model  links  the  expected 
value  E(Pj[)  to  the  linear  predictor  as: 


E(Pi)  = 


n 

l+exp(-Tii) 


(2) 


where  n  (=6)  is  the  number  of  binomial  trials  per  observation. 


Maximum  likelihood  estimation  is  used  to  obtain  estimates  of  the  |3  coefficients.  The  deviance 
(minus  twice  the  log-likelihood  ratio  of  two  models)  may  be  used  as  a  goodness  of  fit 
significance  test,  comparing  the  null  model  (Pj  =  0,  j  =  l,...k)  with  any  given  model.  The 
change  in  deviance  for  adding  or  deleting  groups  of  variables  may  also  be  used  to  test  for  their 

significance.  The  (change  in)  deviance  has  an  approximate  distribution  with  degrees  of 
freedom  dependent  on  the  number  of  parameters  (added  or  deleted). 


Table  2 

Logistic  Regression  Equations 

=  fitted  values  for  the  presence  scale 
A  =  Auditory  Sum,  NS  =  number  of  shadows 
Standard  Errors  shown  in  brackets 

Model 

T|=  15.0  +  0.7*NS  -9.5*A 
_ (3.7)  (0.4) 


Overall  Deviance  =3.454,  d.f.  =  5 
x2at5%on  10  d.f.  =  11.070 


Deletion  of 

Model  Term 

Change  in 
Deviance 

Change  in 
d.f. 

X^  at  5% 
level 

NS 

4.123 

1 

3.841 

A 

9.088 

1 

3.841 

Table  3 

Normal  Regression  Equations 

A 

Pa  =  fitted  values  for  the  angular  discrepancy 
NS  =  number  of  shadows 


Group 

Model 

Visually 

dominant 

Pa=  -13.6  +  10.6*NS 

(3.7) 

Auditory 

dominant 

4=  9.427 -1- 0.08*NS 

(3.7) 

Multiple  Correlation  Coefficient,  r2  =0.29,  d.f.  =  36 


Table  2  shows  the  result  of  the  fit  with  P  as  the  dependent  variable,  and  the  number  of  shadow 
runs  (NS)  and  the  auditory  sum  score  (A)  as  the  explanatory  variables,  across  the  8  subjects. 
These  were  the  only  statistically  significant  variables  found,  and  this  supports  the  hypothesis 
that  subjective  presence  is  positively  related  with  the  shadow  effect.  As  we  have  found 
previously,  given  this  exclusively  visual  VE,  the  greater  auditory  dominance,  as  measured  by 
the  sum  of  A  responses  to  the  pre-questionnaire,  the  less  the  reported  subjective  presence. 

Angular  Discrepancy 

Here  we  take  P^  as  the  dependent  variable  and  carry  out  a  Normal  regression  with  number  of 
shadows  (NS)  and  the  representation  system  scores  as  the  explanatory  variables.  NS  proved 


once  again  to  be  significant  and  positively  related  to  P^.  However,  the  V,  A  and  K  variables 

were  not  significant.  Nevertheless  it  seemed  important  to  try  to  rule  out  the  possibility  that  the 
result  with  the  angular  discrepancy  was  simply  due  to  visual  or  auditory  dominance.  Therefore 
a  new  factor  was  constructed,  "sensory  dominance"  which  has  the  value  1  if  V>A  otherwise  2. 
Hence  this  directly  refers  to  visual  or  auditory  dominance.  The  result  of  the  regression  analysis 
including  this  was  interesting:  for  those  who  were  visually  dominant,  there  is  a  significant 
positive  relationship  between  and  NS,  whereas  there  is  no  significant  relationship  for  those 

who  were  dominant  on  the  auditory  score.  This  is  shown  in  Table  3.  (It  so  happened  that  4  of 
the  subjects  were  visually  dominant). 

5.  Conclusions 

There  are  three  main  issues  :  First,  the  point  of  this  paper  is  not  that  we  have  an  algorithm  that 
can  generate  shadow  umbrae  rapidly  in  dynamically  changing  scenes.  Even  in  this  very  small 
scene  the  rendering  frame  rate  was  no  where  near  adequate.  There  is  clearly  a  lot  of  work  to  do 
in  the  location  of  this  algorithm  in  the  system  architecture,  in  order  to  obtain  maximum 
performance  by  minimising  communication  bottlenecks. 

Second,  although  we  have  considered  depth  and  spatial  perception  problems  in  the  experiment, 
again,  this  is  not  the  major  point.  It  is  more  or  less  obvious,  from  everyday  reality,  and  from 
perceptual  studies  that  shadows  do  indeed  enhance  depth  perception,  and  that  we  are  better  off 
with  them  than  without  them.  Moreover,  our  experimenttd  design  in  this  regard  was  not  ideal, 
since  we  did  not  control  a  factor  (velocity)  that  potentially  has  an  impact  on  the  results. 

Third,  the  real  point  of  the  experiment  was  the  examination  of  the  relationship  between 
dynamic  shadows  and  the  sense  of  presence.  This  result  is  not  obvious,  and  was  motivated  by 
the  idea  that  presence  is  (amongst  other  things)  a  function  of  immersion,  and  immersion 
requires  vividness.  We  used  two  independent  measures  -  one  subjective  from  the  post¬ 
experiment  questionnaire,  and  the  other  objective,  as  a  ratio  of  angles  of  real  to  virtual  pointing 
directions.  Each  method  gave  similar  results,  and  the  two  measures  were  significantly 
correlated.  Moreover,  we  found  that  for  those  people  who  were  more  visually  dominant  their 
(angular  ratio)  presence  increased  with  exposure  to  shadows  but  that  this  did  not  hold  for  those 
who  were  dominant  on  the  auditory  scale.  Increase  in  the  subjective  presence  scale  was  also 
associated  with  an  increase  in  shadow  exposure,  but  with  a  decrease  in  the  auditory  scale. 
These  results  also  support  our  earlier  findings  regarding  the  importance  of  the  sensory  system 
preferences  in  explaining  presence. 

We  suspect  that  much  stronger  results  on  presence  would  have  been  obtained  had  we  been  able 
to  allow  the  virtual  body  to  cast  shadows.  However,  this  was  not  practical  given  the 
communication  bottleneck  problems  discussed  in  §3.2. 

If  an  application  does  not  require  presence,  there  is  little  point  in  using  a  virtual  reality  system. 
If  a  virtual  reality  system  is  used  for  an  application,  then  there  is  little  point  to  this  unless  it  can 
be  shown  that  a  sense  of  presence  is  induced  for  most  of  the  potential  participants.  If  the  results 
of  our  shadow  experiment  are  confirmed  by  later  studies  then  it  will  have  been  shown  that  the 
great  computational  expense  of  shadow  generation  is  worth-while  for  those  applications  where 
die  participants  are  likely  to  be  "visually  dominant". 
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Appendix  A:  Presence  Questions 

All  questions  were  answered  on  a  1  to  7  scale,  not  reproduced  here  for  space  reasons. 

1.  Please  rate  your  sense  of  being  there  in  the  virtual  reality. 

2.  To  what  extent  were  there  times  during  the  experience  when  the  virtual  reality  became  the 
"reality"  for  you,  and  you  almost  forgot  about  the  "real  world"  of  the  laboratory  in  which  the 
whole  experience  was  really  taking  place? 

3.  When  you  think  back  about  your  experience,  do  you  think  of  the  virtual  reality  more  as 
images  that  you  saw,  or  more  as  somewhere  that  you  visited  ? 

4.  During  the  course  of  the  experience,  which  was  strongest  on  the  whole,  your  sense  of  being 
in  the  virtual  reality,  or  of  being  in  the  real  world  of  the  laboratory? 

5.  When  you  think  about  the  virtual  reality,  to  what  extent  is  the  way  that  you  are  thinking 
about  this  similar  to  the  way  that  you  are  thinking  about  the  various  places  that  you've  been 
today? 


6.  During  the  course  of  the  virtual  reality  experience,  did  you  often  think  to  yourself  that  you 
were  actually  just  standing  in  a  laboratory  wearing  a  helmet,  or  did  the  virtual  reality 
overwhelm  you? 


Figure  1 

Plan  View  of  the  Virtual  Environment 
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Abstract 

The  paper  describes  some  innovative  concepts  in 
the  design  of  an  object-oriented  graphics  kernel  that 
could  be  useful  for  future  applications  in  a  network- 
transparent  virtual  reality  environment.  First,  the 
benefits  of  object-orientation  in  graphics  will  be 
briefly  explained.  Afterwards,  some  concepts  of  the 
YART  graphics  kernel  [Bei94a,  Bei94d]  will  be  de¬ 
scribed,  which  allow  the  use  of  YART  in  a  hetero¬ 
geneous,  network-distributed  virtuad  reality  environ¬ 
ment.  The  main  features  are  graphical  output,  in¬ 
teraction  and  interpretative  language  binding.  The 
presentation  of  an  initial  VR  package  on  top  of  YART 
and  some  views  on  future  research  fields  conclude  the 
paper. 

1  Object-Orientation  in  Graphics 

This  section  briefly  describes  the  benefits  of  object- 
orientation  in  3D  graphics  and  gives  an  overview  of 
the  current  state  of  the  art. 

1.1  Conformity  with  the  Mental 
Model  of  Computer  Graphics 

The  object-oriented  paradigm  defines  objects  as  the 
integration  of  data  and  functionality.  As  shown  in 
[Wis90]  these  objects  match  the  mental  model  of  the 
graphics  programmer  in  a  high  degree.  This  means: 

•  Graphical  representations  can  be  seen  as  entities 
or  objects.  They  may  be  manipulated  and  at¬ 
tributed  as  wholes. 

•  They  have  their  own  attributes  which  influence 
the  object  and  nothing  else^ . 

•  Attributes  may  directly  be  assigned  to  or  queried 
from  the  object  itself  (principle  of  locality).  That 
does  not  mean,  that  objects  cannot  share  physi¬ 
cally  existent  attributes  as  shown  in  [Bei94c]. 

•  Operations  can  be  applied  directly  to  the  results 
of  interactions  (e.g.  a  pick  operation).  This  is 
known  as  symmetry  between  input  and  output 
[Wis90]. 

Conventional  systems  often  violate  the  principles  of 
locality  and  input-output  symmetry.  For  instance,  in 

^  An  exception  is  the  inheritance  of  attributes  inside  a  part-of 
hierarchy 


PHIGS  (PLUS)  [IS089,  IS092]  the  attributes  that  are 
valid  for  a  given  structure  cannot  be  queried  because 
the  temporary  results  of  the  traversing  process  cannot 
be  accessed.  The  pick  operation  in  PHIGS  delivers 
a  so-called  pick  path,  from  which  the  picked  output 
object  has  to  be  extracted.  Assigning  an  attribute  to 
the  picked  object  may  cause  fatal  results  in  the  central 
structure  store  because  this  structure  may  be  multiple 
referenced  and  an  automatic  individualization  is  not 
provided  in  PHIGS. 

1.2  Managing  the  Complexity 

Using  inkeriiance^  a  very  high  level  of  abstraction  of 
the  (application-defined)  graphical  primitives  can  be 
obtained.  This  includes  not  only  graphical  output 
functionality,  but  also  semantics.  Traditional  systems 
that  are  based  on  display  lists  like  IRIS  GL  [GL91] 
and  PHIGS  also  allow  the  creation  of  arbitrarily  com¬ 
plex  output  structures  by  recursive  nesting  of  display 
lists  (building  a  directed  acyclic  graph).  However, 
these  structures  can  not  have  an  internal  semantics 
and  their  parameterization  is  limited  to  the  external 
assignment  of  attributes.  Internal  semantics  and  ar¬ 
bitrary  parameterizability  are  useful; 

•  to  implement  dependencies  from  external  condi¬ 
tions  like  modeling  in  space  and  time, 

For  instance,  a  primitive  car  can  rotate  its  wheels 
dependent  on  its  forward/backward  motion.  A 
data  glyph  can  change  its  visual  representation 
in  accordance  with  its  new  location  and  orienta¬ 
tion  in  the  data  field  [NBB94]. 

•  to  define  abstract  methods  that  hide  internal 
structure, 

A  primitive  house  may  offer  a  method  openDoor 
<angle>  that  hides  the  primitives  and  the  actual 
modeling  of  the  doors  representation. 

•  to  apply  the  assignment  of  (abstract)  attributes 
[Bei94c]  in  a  specific  way  and 

For  instance,  the  assignment  of  an  attribute  color 
to  a  primitive  pushbutton  may  apply  the  specified 
color  to  the  pushbutton’s  foreground  and  the  in¬ 
verse  color  to  the  simulated  shadow  and  to  the 
text  label. 


♦  for  a  user-specific  parameterization  of  structures 
(i.e.  classes). 

As  an  example,  for  a  class  WallWiihDoor  a  con¬ 
structor  parameter  may  specify  the  distance  of 
the  door  from  the  walls  origin. 


•  Filters  can  realize  a  flexible  mapping  from  dif¬ 
ferent  interaction  classes  to  a  single  interaction 
class. 

•  Composite  mechanisms  may  increase  the  level  of 


1.3  Separation  between  Modeling  and 
Rendering 

This  separation  is  useful  from  the  view  of  software 
engineering.  It  allows  for  construction  of  customized 
modeling  packages  on  top  of  a  predefined,  more  or 
less  generic  and  extensible  graphics  kernel.  On  the 
other  hand,  a  single  graphical  scene  can  be  rendered 
in  various  ways,  e.g.  producing  a  real  time  low-quality 
output  based  on  a  local  shading  model  (e.g.  wire¬ 
frame  or  Gouraud  shading  [Gou71])  or  a  high-quality 
output  basing  on  a  global  illumination  model  like  ray- 
tracing  [App68]  or  radiosity  [GTGB84].  Ray-tracing 
is  very  time-intensive;  an  entire  recomputation^  must 
be  done  for  each  frame.  Radiosity  can  be  very  fast  for 
static  scenes;  if  the  distribution  of  the  light  is  com¬ 
puted,  frames  can  be  generated  as  fast  as  in  Gouraud 
shading. 

In  practice  there  is  a  need  for  both  extremes. 
Object-oriented  systems  can  provide  different  kinds  of 
rendering  in  parallel  and  in  a  very  consistent  manner. 
This  is  useful  to  create  animations  using  the  fastest 
Tenderer  and  afterwards  computing  the  same  frame 
sequence  in  ray-tracing  mode.  Composite  or  user- 
defined  primitives  behave  equally,  independent  of  the 
type  of  rendering  used.  An  elegant  way  to  integrate 
modeling  and  rendering  interfaces^  for  a  given  class  is 
to  use  multiple  inheritance. 

In  contrast,  most  traditional  graphics  systems  (ray¬ 
tracing  packages,  ISO  and  industrial  shading  libraries) 
are  strongly  bound  to  one  kind  of  rendering  and  are 
not  extensible,  in  either  modeling  or  rendering.  Thus, 
wireframe  modelers  are  used  to  create  scenes  that  will 
be  rendered  later  with  ray-tracers  like  POV  [pov92]  in 
a  batch  process. 

1.4  Interaction 

Object-oriented  mechanisms  can  also  be  the  basis  for 
an  extensible  and  highly  customizable  interaction  con¬ 
cept.  This  means: 

•  New  interaction  classes  may  be  implemented 
which  have  a  semantics  that  is  application- 
dependent  and  can  be  much  more  complex  in 
comparison  to  conventional  interaction  classes 
such  as  pick  and  locator. 

•  New  and  advanced  physical  input  devices  can  be 
represented  by  extended  interaction  classes. 

^with  exception  of  scene  subdivisions 
sets  of  methods  for  the  communication  between  the  primi¬ 
tives  and  modelers  or  renderers 
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interaction  by  providing  more  complex  interac¬ 
tion  classes  that  hide  their  implementation  (low- 
level  interaction  objects). 

•  The  feedback  may  be  realized  in  a  user-defined 
way.  Even  user-defined  primitives  can  customize 
the  feedback. 

•  In  accordance  with  the  above  mentioned  control 

graphical  output  an  immediate  coupling  be¬ 
tween  input  and  output  in  accordance  with  the 
paradigm  of  external  control  may  be  realized. 

In  contrast,  traditional  systems  are  often  based  on 
the  principle  of  internal  control  (mcinaging  an  exclu¬ 
sive  and  central  event-loop)  and  define  a  canonical  set 
of  interaction  classes  and  feedback  variants,  e.g.  the 
GKS  input  model  [IS085]. 

1.5  State  of  the  Art 

Pioneer  work  in  the  area  of  object-oriented  graphics 
has  done  with  the  GEO++  system  [Wis90].  This  in¬ 
cludes  mainly  the  subjects  attributes,  part-of  hierar¬ 
chies  and  handling  of  multiple  references.  Though  this 
system  is  often  cited,  its  concepts  are  unfortunately 
not  subsequently  considered  in  the  design  of  other  sys¬ 
tems.  GE04-+  simulates  the  principal  GKS/PHIGS 
functionality  on  top  of  the  Smalltalk  [GR89]  graphics 
kernel. 

Several  class  libraries  encapsulate  conventional 
shading  libraries,  e.g.  GROOP  [KW93]  which  is  built 
on  top  of  IRIS/AIX  GL,  and  provide  additional  func¬ 
tionality  (e.g.  animation  support). 

In  contrast,  other  systems  are  independent  from  a 
particular  illumination  model.  They  allow  the  ren¬ 
dering  of  a  graphical  scene  with  different  renderers. 
Examples  are  the  AVS  graphics  kernel  Dore  [Kap91], 
GRAMS  [EK92]  and  YART.  An  upcoming  ISO  stan¬ 
dard  on  this  area  is  PREMO  [H+94,  IS094]. 

YART  is  a  general  purpose  3D  graphics  kernel,  that 
supports  various  kinds  of  rendering  and  is  ported  to 
multiple  platforms,  e.g.  OpenGL  [NDW93],  IRIS  GL, 
PHIGS  PLUS  and  XI 1  [Nye90].  YART  also  contains 
an  extensible  interaction  model  and  offers  an  interpre¬ 
tative  language  binding  (Tcl  [0us91b])  that  is  com¬ 
patible  with  the  C+H-  [ES90]  API. 

In  the  following  we  want  to  consider  some  de¬ 
sign  and  implementation  concepts  embodied  in  YART 
which  make  this  kernel  useful  for  a  heterogeneous, 
network-distributed  virtual  reality  environment. 

2  Graphical  Output 

2.1  Independence  of  Shading  Model 
and  Low-Level  Library 

As  seen  from  the  API  programmers  view,  the  YART 
graphical  primitives  are  independent  from  the  shad- 


ing  library  underlying  YART  and  even  from  the  im¬ 
plemented  kinds  of  rendering.  As  mentioned  above 
different  protocols  for  communication  with  render ers 
(and  modelers)  can  be  integrated  into  the  YART  base 
primitives  via  multiple  inheritance.  YART  currently 
supports  {wireframe,  Gouraud}  shading,  ray-tracing 
and  non-ideal  diffuse  radiosity.  Fig.  1  depicts  the  in¬ 
tegration  of  rendering  and  modeling  interfaces  in  the 
YART  primitives.  Additionally  a  perspective  camera 
is  provided  that  implements  these  illumination  mod¬ 
els.  This  camera  can  be  switched  from  one  model  to 
another  simply  by  calling  one  method. 


Figure  1:  Topical  configuration  of  the  YART  graphics 
kernel. 

Thus,  on  different  platforms  specialized  illumination 
models  can  be  used  for  the  same  scene  description.  A 
highly  parallel  system  may  permanently  run  in  ray¬ 
tracing  mode,  while  most  platforms  (PCs,  worksta¬ 
tions)  use  a  shading  mode.  In  the  future,  new  illu¬ 
mination  models  can  be  integrated  into  the  general 
YART  distribution  or  in  special  implementations. 

To  maximize  portability  YART  uses  a  low-level 
API  as  the  interface  to  platform-dependent  low-level 
graphics,  interaction  and  window  handling  routines. 
Using  this  API,  ports  to  various  platforms  could  be 
implemented  with  little  effort.  Less  than  5%  of  the 
current  YART  code  is  platform-dependent  and  this 
portion  will  decrease  in  parallel  with  the  implementa¬ 
tion  of  high-level  facilities  such  as  advanced  modeling 
classes  and  parametric  curves  and  surfaces. 

2,2  Transfer  of  High-Level  Primitives 

In  contrast  to  most  traditional  shading  libraries  which 
define  universal  primitives  such  as  polyline,  polygon 
and  quadmesh,  the  YART  graphics  kernel  supports 
analytical  and  other  high-level  primitives^.  Examples 
for  analytically  defined  primitives  are  sphere,  cylinder, 

*also  supported  by  other  object-oriented  systems  or  (con¬ 
ventional)  ray-tracers 


cone,  torus  and  so  on;  other  high-level  primitives  will 
be  mostly  constructed  by  reading  an  external  data  file. 
Examples  are  Hershey  [Her67]  or  TrueType  [MS92]  3D 
text,  OFF  [Ros89]  based  primitives  or  primitives  con¬ 
structed  from  scientific  data  sets.  In  most  cases  it  is 
much  more  efficient  to  transfer  just  these  high-level 
primitives  instead  of  their  tesselated  low-level  repre¬ 
sentations.  For  instance,  a  sphere  is  created  in  YART 
by  a  string  like  Sphere  $  l.S  plus  some  additional 
modeling®  (e.g.  s  -ambient  {10  0}).  To  trans¬ 
fer  this  sphere  via  the  DGL  protocol  (distributed  GL 
[GL91]),  for  instance,  a  mesh  in  the  current  resolution 
of  the  sphere  (e.g.  40x40  vertices)  has  to  be  sent  over 
the  net  including  vertex  normals  and  colors. 

Additionally,  for  often  used  data  sets,  such  as  fonts 
or  OFF  data,  the  transfer  of  the  data  sets  can  be 
avoided  because  they  may  exist  both  on  the  server 
and  client  sides. 

2.3  Efficient  Modeling 

One  design  aim  of  the  YART  kernel  was  to  minimize 
the  modeling  data  to  the  absolutely  minimal  amount 
of  memory.  Inside  hierarchies  of  graphical  primitives 
the  geometrical  modeling  and  attributes  will  be  in¬ 
herited  (hierarchical  modeling).  This  makes  the  pro¬ 
gramming  easy.  In  addition  by  using  references  and 
automatic  individualization  of  attributes  and  model¬ 
ing  matrices  a  lot  of  memory  can  be  saved.  This  also 
reduces  network  costs,  because  redundant  matrices 
and  attributes  do  not  exist  explicitly®  and  therefore, 
do  not  have  to  be  sent  via  the  net. 

2.4  Abstract  Attribute  Types 

YART  provides  abstract  attributes  in  the  sense  that 
these  attributes  do  not  belong  to  a  given  class  of  a 
primitive.  Every  primitive  can  implement  the  assign¬ 
ment  of  a  specific  attribute  in  its  own  way.  Currently, 
YART  supports  following  abstract  attribute  types: 

•  The  resolution  attribute  allows  a  primitive  to  tes- 
selate  itself  dynamically.  This  is  used  for  the 
mapping  of  analytical  primitives  and  parametric 
curves  and  surfaces^.  A  high  resolution  implies 
a  good  approximation  of  the  mathematical  shape 
and  a  higher  quality  for  the  Gouraud  shading  or 
the  radiosity  computation. 

•  The  fiUsiyle  attribute  switches  a  primitive  into 
different  representations.  Currently  wireframe 
and  solid  representation  are  supported  as  well  as 
some  bounding  box  styles. 

•  The  surface  attribute  contains  several  surface  pa¬ 
rameters  such  as  diffuse  and  specular  coefficients. 
Others  are  transparency,  refraction  and  emission. 

^see  next  section 

®They  are  expressed  implicitly  in  the  analytic  primitivcs- 

^  A  YART  extension  with  these  classes  is  in  development  but 
not  released  yet. 
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•  The  mapping  attribute  specifies  extended  surface 
parameter,  e.g.  a  solid  texture,  an  image  or  a 
bump  mapping.  The  mapping  of  this  attribute 
to  the  representation  of  the  primitive  is  currently 
only  available  in  ray- tracing  mode  and  not  for  all 
primitives. 

The  rendering  time  for  a  given  scene  is  very  depen¬ 
dent  on  the  first  attribute  types  described  above.  As¬ 
suming  that  these  attributes  are  not  set  for  specific 
objects,  the  (global)  default  attributes  will  be  used. 
Thus,  on  different  platforms  a  specific  output  can  be 
realized  by  just  setting  the  default  attributes,  e.g.  in 
an  initialization  file: 

AiiribuieObjeci  -fiUsiyle  1  -resolution  0,33 
This  turns  on  the  solid  fillstyle  and  sets  the  resolu¬ 
tion  to  0.33.  A  setup  like  this  could  be  used  on  a 
Silicon  Graphics  machine,  which  has  a  good  shading 
performance.  On  platforms  where  our  XI 1  Gouraud 
shader®  must  be  used  in  absence  of  a  hardware-based 
shader  one  could  use  the  following  setup: 

Aiiribuit Object  -fillstyle  0  -resolution  0,1 
This  turns  on  the  wireframe  display  and  a  minimal 
representation  of  analytical  primitives  for  all  graphical 
objects  that  do  not  override  these  attributes. 
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Figure  2;  A  scene  rendered  with  different  values  for 
fillstyle  and  resolution. 

Figure  2  shows  a  YART  scene  with  different  values 
of  attribute  types  Fillstyle  and  Resolution,  The  ren¬ 
dering  was  done  on  an  entry-level  SGI  workstation 
(R3000  IRIS  Indigo)  using  the  IRIS  GL  interface  of 
YART.  The  time  is  the  average  of  about  100  render¬ 
ing  processes.  Each  rendering  process  performs  a  com¬ 
plete  update  of  attribute  references  and  a  propagation 
of  attribute  changes. 

The  resolution  attribute  may  be  synchronized  with 
the  motion  velocity  in  a  scene.  For  a  fast  motion  in 
a  scene  a  very  low  resolution  should  be  used.  How¬ 
ever,  if  the  motion  stops  for  some  time,  the  resolution 
could  be  increased.  An  advanced  feature  is  to  decrease 
the  resolution  of  objects  which  are  far  away  from  the 
camera.  This  saves  memory  and  rendering  time. 

As  described  in  [Bei94c]  new  attribute  types  can  be 
implemented  in  YART  in  the  same  manner  as  these 
predefined  ones,  e.g.  a  text  attribute  containing  the 

*  which  is  actually  very  slow  compared  to  IRIS  GL  on  entry- 
level  SGI 


font  and  a  vertical  scaling  value.  The  new  attributes 
will  be  handled  with  the  same  efficiency. 

3  Interaction 

YART  implements  a  highly  flexible  and  extensible  in¬ 
teraction  model  [BS94],  that  is  based  on  an  extensible 
set  of  event  types  and  event  consuming  input  objects 
with  an  optional  hierarchical  structure. 

The  focus  of  this  model  lies  on  events,  not  on  inter¬ 
action  classes.  There  exists  a  set  of  event  types,  which 
includes  predefined  event  types  for  basic  interactions 
like  mouse  and  spaceball  control.  Higher  level  events 
may  be  derived  from  these  or  may  correspond  to  ad¬ 
vanced  interactions.  The  orientation  in  portable  basic 
events,  which  have  to  be  implemented  in  dependence 
of  the  library  below  YART  (e.g.  IRIS  GL  or  XU),  al¬ 
lows  the  definition  of  interaction  classes  in  a  portable 
and  flexible  way. 

Besides  the  events  there  exists  a  set  of  input  objects 
at  run-time.  An  input  object  is  defined  as  follows: 

•  It  consumes  a  (potentially  empty)  set  of  event 
types. 

•  It  generates  a  (potentially  empty)  set  of  event 
types. 

•  It  is  defined  by  a  state  that  includes  the  last  oc- 
cured  event (s). 

•  There  is  a  trigger  defined;  when  triggering  of  the 
input  object  a  list  of  callback  objects  will  be  in¬ 
voked. 

•  It  may  have  a  father  object.  This  may  be  a  ren- 
derer  object  if  the  interpretation  or  generation  of 
events  depends  on  the  Tenderer  parameters  (e.g. 
viewing  transformation).  The  father  object  may 
also  be  another  input  object,  allowing  the  defini¬ 
tion  of  composite  input  objects  with  a  higher  ab¬ 
straction  level.  Of  course,  input  objects  may  be 
driven  in  stand-alone  mode.  Additionally,  each 
input  object  may  have  an  arbitrary  number  of 
children. 

•  It  generates  a  visible  feedback  dependent  of  its 
state.  Even  user-defined  primitives  may  be  used 
to  create  the  feedback. 

Several  interaction  classes  have  been  implemented, 
some  for  demo  purposes,  others  to  provide  a  mini¬ 
mal  functionality.  In  every  case,  they  are  examples 
and  can  be  replaced  by  user-defined  ones. 

•  Pick  and  (3D-)Locator  realize  functionality  com¬ 
patible  to  interaction  classes  of  the  GKS/PHIGS 
input  model. 

•  The  Manipulator  allows  intuitive^  manipulations 
of  primitives  in  six  degrees  of  freedom  and  sup¬ 
ports  both  mouse  and  spaceball. 

•  The  FileDevice  provides  support  for  file  de¬ 
scriptors,  such  as  sockets.  Using  a  special 

®i.e.  in  accordance  to  the  user’s  expectation  independently 
of  view  orientation  £ind  loc2d  and  inherited  modeling 
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parameterization^®  the  stdin/stdout  streams  can 
be  used  to  control  applications  via  textual  com¬ 
mands  in  parallel  with  other  interaction  objects, 
e,g.  WYSIWYG  user  interfaces. 

Others  are  vertex  manipulator  for  vertex-based  prim¬ 
itives  or  a  walk-thru  object  for  VR  applications  that 
maps  mouse/spaceball  motions  to  the  camera  param¬ 
eters  (view  point,  reference  point).  Future  event 
types  and  interaction  classes  may  encapsulate  more 
advanced,  real-word  actions. 

4  Interpretative  Language  Binding 

Beside  the  C-h+  API,  which  is  a  result  of  the  im¬ 
plementation,  YART  also  offers  an  interpretative  lan¬ 
guage  binding  that  is  consistent  to  the  CH-+  one.  This 
binding  and  the  related  object  model  is  described  in 
[Bei94e];  as  its  basis,  the  extensible  {C,  LISP,  UNIX- 
Shell}-like  interpreter  language  Tcl  is  used. 

4,1  Interfacing  with  Tcl 

Tcl  is  a  type  less  interpretative  language  that  oper¬ 
ates  on  strings.  Tcl  commands  are  sets  of  strings.  The 
first  string  identifies  the  command,  the  remainders  are 
taken  as  arguments  and  will  be  evaluated  by  the  com¬ 
mand.  Consequently,  object-oriented  techniques  such 
as  method  calling  can  be  realized  easyly,  using  this 
flexible  approach. 

In  opposite  to  many  interpretative  languages,  the 
aim  of  Tcl  is  to  be  extended  by  application-specific 
commands.  Creating  a  Tcl  interface  to  an  exist¬ 
ing  C/C++  application  consumes  some  additional 
effort^^,  but  provides  some  important  features: 

•  There  are  no  turn-around  times^^  when  using  the 
Tcl  API.  This  is  useful  for  learning  and  debugging 
purposes,  and  for  prototyping. 

#  The  definition  of  the  Tcl  interface,  including  the 
mapping  of  C(++)  data  structures  to  Tcl  vari¬ 
ables  and  values,  normally  forces  the  specification 
of  a  clean  C(++)  API.  In  most  cases  this  API 
provides  a  high  abstraction  level  hiding  lower- 
level  data  and  functional  structures,  which  can 
be  implemented  efficiently  in  C  or  C++^^. 

♦  Once  defined,  the  Tcl  API  may  be  used  as  tex¬ 
tual  interface  for  user  input,  but  also  to  put  a 
WYSIWYG  user  interface  on  top  or  to  have  a 
network-wide  access  to  the  C++  objects.  The 
user  interface  aspect  is  very  important,  because 
a  strong  separation  between  application  function¬ 
ality  and  extensible  user  interfaces  is  provided. 

0  for  stdln 

However,  this  can  be  minimized  using  object-oriented  tech¬ 
niques  ^ls  shown  in  [Bei94e].  The  author  evenly  implemented  an 
1:1  Tcl  interface  to  OpenGL  in  a  very  generic  and  efficient  way 
by  parsing  the  OpenGL  header  files  and  using  implicit  C++ 
operators  for  parameter  conversion. 
^^edit-compUe-link-nm/debug  cycles 

Numerous  projects  staind  for  the  usability  of  this  design 
philosophy,  such  as  a  commercial  CFD  system,  a  virtual  reality 
application  and  a  3D  graphics  kernel. 


Additionally,  Tcl  comes  with  Tk  [0us91a],  which 
is  a  widget  toolkit  similar  to  OSF /Motif,  but  ac¬ 
cessible  thru  Tcl. 

•  Furthermore,  the  definition/implementation  of 
scene  description  or  metafile  formats  is  easy  when 
using  the  Tcl  API. 

•  Advanced  features  of  such  a  Tcl/C++  mapping 
[Bei94b]  may  include  the  interactive  behavior 
query  of  the  C++  objects  and  the  automatic  cre¬ 
ation  of  user  interface  dialog  boxes  or  manual 
pages. 

For  a  network-based  virtual  reality  environment 
following  facts  can  be  extracted  from  the  above- 
mentioned: 

•  persistent  storage  of  graphical  scenes  (generating 
calls  to  the  Tcl  API)  and 

•  network-transparent  and  platform-independent 
access  to  C++  objects  (transparent  execution  of 
Tcl  API  calls  via  the  net). 

The  first  fact  makes  it  possible  to  get  the  state  of  a 
scene  just  by  generating  a  scene  dump.  The  second 
feature  allows  the  transfer  of  complete  scene  dumps 
or  incremental  scene  changes  via  the  net  -  without 
consideration  of  binary  formats  and  byte  sequence. 

4.2  Interpretative  Class  System 

On  top  of  the  Tcl  API  an  interpretative  and  persistent 
class  system  is  implemented  [Bei94e,  Bei94b].  This 
class  system  allows  the  definition  of  new  classes  (e.g. 
primitives)  at  run  time  in  a  portable  way.  Thus,  it 
is  possible  to  increase  the  abstraction  level  by  defin¬ 
ing  more  complex  primitives.  However,  a  more  im¬ 
portant  point  is,  that  semantics  can  be  integrated  in 
these  classes  (by  defining  methods)  in  a  portable  (and 
even  network-transparent)  way.  Conventional  scene 
description  languages  do  not  offer  this  possibility. 

The  class  system  supports  single  inheritance,  public 
methods  and  private  members.  It  is  compatible  to 
the  general  YART  C++/Tcl  object  model  and  offers 
persistency,  too. 

4.3  On  the  Way  to  a  VRML 

Basing  on  this  interpretative  language  binding  an  in¬ 
teractive  VR  Makeup  Language  could  be  specified  by: 

•  a  definition  of  a  subset  of  Tcl  (no  use  of  ezec, 
proc,  piyd,  ci,  etc.)^^ 

•  a  definition  of  the  YART  API  including  the  class 
system 

•  a  definition  of  specific  YART  extensions  -  so- 
called  packages^  e.g.  for  scientific  visualization 

•  a  definition  of  mechanisms  to  provide  data  sets 
(separate  transfers,  load  paths,  etc.  ) 

These  Tcl  procedures  would  provoke  incompatible  scene 
descriptions. 
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5  A  Concrete  Implementation 
YART/VR 

The  package  presented  in  the  following  is  more  an  ap¬ 
plication  of  the  YART  graphics  kernel  than  a  serious 
VR  tool.  Nevertheless  it  realizes  some  concepts  which 
could  be  interesting  in  this  context. 

5.1  Metaphores 

The  application  is  based  on  two  metaphores:  the 
scene  and  the  user.  A  scene  is  a  kind  of  a 
server  running  anywhere  in  the  net  and  can  be 
addressed  via  <scene-name>@<hostjaame>,  e.g. 
castle@141. 24.32.29^5 

A  physical  user  is  represented  by  an  object  of  a  Tcl 
class  that  can  be  specified  freely  by  the  user.  The  rep¬ 
resentation  is  built  from  YART  primitives  like  sphere, 
block,  cylinder,  quadmesh,  polygon,  polyhedron,  text 
and  so  on.  Additionally,  specific  methods  can  be  pro¬ 
vided  for  the  user  class. 

The  scene  currently  accepts  following  user  com¬ 
mands: 

•  login/logout 

•  move  forward  (backward)  by  a  specified  distance 
in  view  direction 

•  move  upward  (downward)  by  a  specified  distance 
(i.e.  in  y  direction) 

•  rotate  left/right  by  a  specified  angle 

•  send  a  pick  ray  into  the  scene 

•  broadcast  a  textual  message  to  all  users 

•  call  a  method  of  the  user  representation  with  an 
arbitrary  number  of  arguments 

User  commands^®  will  be  sent  to  the  scene  and  the 
scene  sends  resulting  commands  back.  Thus,  the  scene 
controls  all  user  movements  and  can  prevent  a  user 
from  going  thru  a  wall,  for  instance. 

Scenes  and  users  correspond  to  a  client-server  ar¬ 
chitecture.  The  user  application  is  the  client  which 
contains  a  hidden  copy  of  the  global  and  visible  scene. 
All  changes  result  in  the  broadcast  of  small-sized  se¬ 
quences  of  Tcl  commands  (typically  less  than  500 
bytes)  to  all  mirror  scenes.  Thus,  YART/VR  im¬ 
plements  incremental  scene  changes.  Of  course,  for 
the  initial  setup  of  a  mirror  scene  the  topical  state 
of  the  server  scene  must  be  transferred  over  the  net. 
This  scene  dump  is  typically  100  kBytes  and  more; 
however,  the  transfer  costs  can  be  reduced  by  using  a 
compression  utility. 

5.2  Implementation 

Two  executables  are  provided,  when  YART/VR  is  in¬ 
stalled.  These  are  standard  tools  and  can  be  replaced 
by  other  ones. 

^^This  host  isn’t  reachable  via  the  net. 

These  are  on  a  high  level,  hiding  elementary  mouse  and 
spaceball  motions,  and  resxilt  partisJly  from  precomputations 
on  client-side,  such  as  pick-correlation. 


t;r_sccne  will  be  called  by  the  executable  scene 
description  files.  Principally,  scene  description 
files  are  standard  YART  metafiles,  which  can  be 
created/modified  using  a  text  editor  or  a  direct- 
manipulative  user  interface.  Fig.  3  shows  a  screen 
shot  of  a  scene  representing  our  institute;  the  related 
scene  description  file  has  a  size  of  20  kbytes  well- 
formatted  and  documented  Tcl  code.  By  the  way, 
this  scene  consists  of  about  6.700  YART  primitives. 

Additionally,  in  a  scene  file  may  be  specified:  the 
initial  position  and  view  orientation  of  a  user  that 
comes  into  the  scene  and  the  maximal  number  of  users 
allowed  in  the  scene. 


Figure  3:  A  scene  representing  our  institute. 


The  predefined  semantics  of  a  pick  in  the  scene  is  as 
follows:  If  a  user  was  hit  by  the  ray  the  picking  user 
gets  a  message  containing  the  picked  users  name.  The 
picked  user  will  be  informed  that  he  was  picked  by  the 
picking  user.  This  semantics  may  be  overwritten  in  a 
scene  file.  Its  implementation  may  use  commands  for 
transferring  of  binary  large  objects  and  for  sending  of 
messages  or  commands. 

vr-ttser  is  a  Tk  based  front  end  (fig.  4)  that  pro¬ 
vides  mouse-driven  interactions  for  moving  in  the 
scene,  scales  for  camera  parameters  and  text  in¬ 
put/output.  Per  convention,  vr^^user  expects  a  file 
^{ffOME}/.yarivr  that  contains  the  user’s  representa¬ 
tion  in  form  of  a  class  definition  (and  implementation) 
and  supports  a  file  ${HOMEy /.vr..userrc  for  configu¬ 
ration  of  the  front  end  (see  fig.  2). 

The  current  implementation  of  scene-user  commu¬ 
nication  is  based  on  internet  domain  sockets.  As  plat¬ 
forms  all  UNIX  systems  with  at  least  XI 1  graphics 
and  BSD  compatible  networking  are  supported. 

5.3  Extensions 

Both  client  and  server  may  be  extended  without  loos¬ 
ing  compatibility  to  running  applications. 

5.3.1  User  (Front  Ends) 

The  front  end  (currently  the  vr^user  program)  may  be 
extended  in  following  ways: 

•  stereo  output  for  SGI  machines  (a  YART  inter¬ 
face  to  SGI  stereo  hardware  and  IRIS  GL  is  part 
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Figure  4:  Meeting  myself  in  a  room  of  our  institute. 

of  the  YART  distribution), 

•  textual  control  of  movement  allowing  faster  and 
more  precise  motion, 

•  radiosity  and  ray-tracing  images  of  the  scene  to 
capture  an  impression  in  an  advanced  rendering 
quality  and 

•  spaceball  integration  (YART  already  provides 
spaceball  events  for  that.). 

5.3.2  Scene 

The  complete  semantics  of  user  motion  and  picking  is 
controlled  by  the  scene.  Thus,  advanced  scenes  (sub¬ 
classes  of  the  predefined  scene  class)  may  implement 
any  specific  behavior.  An  intelligent  handling  of  sub¬ 
scenes  may  reduce  the  rendering  and  transfer  time 
drastically. 

5.4  Communication 

The  current  socket/XDR  [KP84]  stream  based  com¬ 
munication  is  nothing  other  than  a  subclassed  real¬ 
ization  of  the  abstract  communication  classes.  Thus, 
other  implementations  (e.g.  usage  of  shared  memory 
in  local  connections)  could  be  used  for  future  versions. 
Additionally,  the  usage  of  the  GNU  compressor  gzip 
for  the  transfer  of  BLOBS  is  hidden  inside  the  con¬ 
crete  communication  classes. 

6  Future  Research  Subjects 

The  following  topics  are  taken  from  a  proposal  for  a 
project  sponsored  by  the  German  research  commu¬ 
nity.  The  aim  is  to  extend  the  YART  graphics  kernel 
for  the  special  needs  of  complex  VR  scenes. 


6.1  Scalable  Rendering  Technology 

Up  to  now,  there  is  a  strong  separation  between  lo¬ 
cal  and  global  illumination  models  and  related  kinds 
of  rendering.  Most  packages  implement  exactly  on 
model  and  sometimes  a  second  one  for  pre-viewing. 
YART  integrates  both  local  and  global  illumination 
models  and  is  open  for  new  ones.  However,  the  dif¬ 
ferent  kinds  of  rendering  are  strongly  separated  in 
YART.  Is  a  scalable  rendering  technology  realizable 
that  integrates  existing  and  new  kinds  of  rendering 
and  allows  a  linear  increasing  rendering  quality? 

6.2  Time-Dependent  Rendering 

Normally  the  rendering  time  t  is  a  function  of  render¬ 
ing  quzdity  q  and  the  scene  complexity  c.  For  a  given 
(g,  c)  on  a  specific  platform  a  certain  amount  of  time 
will  needed  to  render  the  scene.  An  interesting  ques¬ 
tion  is:  Is  it  possible  to  choose  g  and  c  in  dependence 
of  a  predefined  time  value? 

6.3  High-Level  Clipping  and  Data 
Management 

The  rendering  time  of  ultra  complex  scenes  can  be  re¬ 
duced  by  using  some  kinds  of  high-level  clipping  to 
exclude  invisible  (hierarchies  of)  objects  much  earlier 
from  the  rendering.  A  solution  for  this  under  the  con¬ 
dition  of  rectangular  elements  (e.g.  rooms  in  a  house) 
is  proposed  in  [Bro86]. 

The  management  of  the  parallel  existent  objects  is 
somewhat  equivalent  to  the  former  point.  Objects 
which  are  invisible  and  do  not  have  an  own  semantics 
don’t  need  to  exist  and  therefore  can  be  removed  from 
the  memory. 

7  Conclusions 

YART/VR  is  based  on 

•  Tcl  -  an  extensible  interpreter  language 

•  Tk  -  an  XI 1  widget  set  based  on  Tcl 

•  YART  -  a  general-purpose  3D  graphics  kernel 

•  lOM  -  Tk  based  tools  for  programming  YART 
All  these  packages  are  public  domain  and  can 
be  ftped  from  ftp://meiaUica.prakinfAu-ilmenau.de/- 
pub/PROJECTS/GOOD*.**^'^  (or  from  other  ftp 
sites).  The  directory  contains  a  README  file  that 
describes  all  the  files  in  this  directory. 

A  set  of  WWW  documents  about  YART,  lOM, 
YART/VR,  etc.  can  be  reached  via  hiip://meiallica.- 
prakinf.iu-Umenau.de/GOOD.kimL 

The  author  is  available  on  insiiiuie@speedy.prak- 
inf.iu-ilmenau.de  when  he  is  at  the  office  and  would 
be  happy  to  receive  visitors  from  the  net. 

Special  thanks  goes  to  Frank  Wicht,  Jochen  Pohl 
(both  TU  Ilmenau)  and  P.  Bryan  Heidorn  (University 
of  Pittsburgh)  for  supporting  this  work. 
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^^Look  for  the  highest  number,  ple2ise. 
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Abstract  -  This  paper  describes  a  new  rendering  tool  for  the  fast  and  efficient 
rendering  of  virtual  environments.  The  usage  of  rendering  modules  with  dynamic 
level-of-detail  technology  is  the  upcoming  answer  to  the  up  to  now  insufficient 
graphical  hardware.  The  paper  outlines  the  design  of  a  fuzzy-based  dynamic 
level-of-detail  controller  and  optimisation  with  genetic  algorithms.  Furthermore 
the  potential  of  this  new  technology  will  be  shown  with  the  summary  of  recent 
development  work  at  the  Fraunhofer-Institute  for  Manufacturing  Engineering  and 
Automation,  which  has  made  possible  the  use  of  Virtual  Reality  for  robot 
simulation,  off-line  programming  and  remote  operation  of  industrial  robots 

Keywords:  Virtual  Reality,  Fuzzy-Logic,  Genetic  Algorithms,  Robot  Simulation. 


1.  INTRODUCTION 

International  competetiveness  is 
characterised  by  the  reduction  of 
innovation  time.  Therefore  the  success  of 
new  products  strongly  depends  on  the 
necessary  time  for  their  development 
Virtual  Reality,  as  a  new  3D  human- 
computer  interface  significantly  accelerates 
the  processes  of  creating  and  handling  3D 
data  in  3D  space  for  design  and  evaluation 
purposes. 

Virtual  Reality,  providing  advanced 
human-computer-interfaces  can  make  a 
valuable  contribution  to  improve  simulation 
and  control  tools.  The  operator,  using  a 
dataglove  and  head-mounted  stereo  display 
can  act  in  a  virtual  world  and  is  no  longer  a 
passive  observer  [1]. 

Characteristic  in  Virtual  Realtiy  is  the 
interaction  and  the  perception  in  a  Virtual 
environment  This  is  only  possible  by  real¬ 


time  simulation  and  rendering.  This  led  to 
expensive  but  high  performance  computer 
technology. 

The  Fraunhofer-Institutes  carrying  out 
applied  research  for  small  and  medium¬ 
sized  enterprises  (SME)  is  familiar  with 
industrial  demands.  For  establishing  VR 
simulation  technologies  in  industry  it  is 
necessary  to  develop  fast  rendering 
algorithms  to  be  able  to  use  relatively  cheap 
hardware  for  real  time  rendering  tasks. 

This  paper  outlines  a  new  dynamic 
level-of-detail  algorithm  which  is  based  on 
a  fuzzy-controller  with  parameters  which 
are  optimized  by  genetic  algorithms. 

2.  REQUIREMENTS  FOR  VISUALI¬ 
SATION  TOOLS 

All  graphical  systems  have  limited  capa- 


bilities  that  affect  the  number  of  geometric 
primitives  that  can  be  displayed  per  fiame 
at  a  specific  frame  rate.  The  general 
requirement  is  a  maximum  frame  rate 
including  optimal  quality  of  the  displayed 
scene.  These  two  goals  are  in  contradiction 
to  each  other.  The  driving  force  behind  IPA 
development  is  to  find  the  best  compromise 
between  quality  of  the  scene  and  frame 
rate. 

Because  of  the  existing  limitations, 
maximising  visual  output  quality  while 
minimising  the  polygon  count,  level-of- 
detail  processing  is  one  of  the  most 
promissing  tools  available  for  managing 
scene  complexity  for  the  purpose  of 
improving  display  performance. 

To  reduce  rendering  time,  objects  that 
are  visually  less  important  in  a  frame  can  be 
rendered  with  less  detail  or  in  a  lower 
degree  of  abstraction.  The  level-of-detail 
approach  to  optimising  the  display  of 
complex  objects  is  to  constmct  a  number  of 
progressively  simpler  versions  of  an  object 
and  to  select  one  of  them  for  display  for 
example  as  a  function  of  range. 

This  method  requires  the  creation  of 
multiple  representations  of  an  object  with 
varying  levels  of  detail  or  levels  of 
abstraction.  Rules  must  be  given  to 
determine  the  best  representation  of  the 
object  to  be  displayed  [2]. 

The  requirement  must  be  met,  that 
enough  representations  of  an  object  with 
different  quality  levels  exists.  Only  on  a 
comprehensive  list  of  graphical 
representations  a  satisfactory  work  of  a 
graphical  output  controlled  rendering  is 
guaranteed. 

To  use  extended  level-of-detail- 
technology,  a  large  set  of  graphical 
representations  must  be  designed.  Several 
methods  are  known.  The  easiest  method  is 
to  design  repesentations  in  different 


degrees  of  abstraction  by  hand.  Also  semi 
automatic  or  automatic  methods  are  possi¬ 
ble.  The  objective  is  to  get  representations 
with  different  numbers  of  polygons.  The 
rendering  of  any  single  polygon  is  the  most 
important  criterion  for  the  graphical 
performance  of  the  computer. 

One  very  simple  representation  of  an 
object  is  the  bounding  box,  build  up  with 
six  polygons,  another  one  is  a  minimal 
convex  hull.  This  representation  looks  like 
a  rubber  balloon  stretched  over  the 
described  object.  Often  it  is  enough  to  use 
this  representation  in  a  very  complex  scene 
or  in  case  of  background  scenery. 

Other  automatically  created  level-of- 
detail  representations  can  be  generated  by 
defining  tolerance  measures  around  the 
surface  of  the  described  object.  If  neigh¬ 
boring  vertices  are  inside  of  this  predefined 
tolerance  measure,  all  polygons  belonging 
to  these  vertices  are  merged.  With  this 
method  different  level-of-detail  repre¬ 
sentations  can  be  designed  by  graduated 
tolerances.  This  procedure  is  very  efficient 
to  design  large  landscapes.  Figure  1 . 

If  the  virtual  world  is  designed  ly 
polygonal  representations  of  analytical 
surfaces  like  cylindrical,  torus  or  cone 
surfaces,  each  surface  can  be  directly 
created  in  different  representations.  These 
must  be  fine-tuned,  because  non-fitting 
adjacent  surfacees  could  result.  A 
broomstick  for  example  could  be  shown  by 
a  cylinder  with  a  rounded  end  built  of  a 
spherical  surface.  The  usage  of  different 
levels  of  detail  will  create  a  confusing  look. 
There  is  the  possibility  of  holes  appearing 
and  edges  might  not  fit  anymore.  Figure  2. 

Assuming  that  enough  different 
representations  of  any  object  are  designed, 
the  algorithm  deciding  the  current 
representation  is  the  most  important  part  of 
the  rendering  modul.  Rules  have  to  be 
defined  to  change  the  level  of  detail.  The 


governing  influencing  factors  are  the 
following  [3]: 


1.)  Distance  from  the  viewer  to  the 


Figure  1:  Three  landscape  representations 
concerned  object 

The  distance  is  an  important  parameter. 
The  nearer  an  object  is,  the  more  important 
it  is  for  the  whole  scene. 

2. )  Size  of  the  object 

Of  the  same  importance  is  the  size  of  an 
object.  Both  parameters  together  define  the 
size  of  the  considered  object  on  the  screen. 

3. )  Position  in  relation  to  the  center  of  the 
display 

The  user  wearing  a  Head  Mounted 
Display  will  usually  look  in  the  center  of 
the  viewed  scene.  Objects  on  the  edge  of 
the  visible  firustrum  are  therefore  assumed 
to  be  less  important  then  objects  in  the 
middle  of  the  screen. 

4. )  Complexity  of  the  object 

The  complexity  of  an  object  is 
expressed  by  Ae  number  of  polygons  used 
to  display  the  object. 

5. )  Special  interest  of  the  object 

Some  objects  have  a  predefined  special 


interest.  In  a  robotic  workcell  the  gripper 
of  the  robot  e.g.  would  be  more  interesting 
than  many  other  objects. 

6.)  Velocity  of  the  object 

Using  velocity  of  objects  as  a  parameter 
the  importance  of  fast  objects  can  be 
reduced  as  these  can't  be  seen  correctly, 
like  for  example  the  propeller  of  a  plane. 

A  continuous  change  of  the 
representations  of  different  objects  may 
result  fix)m  free  navigation  of  the  user  in 
the  virtual  world.  This  change  can  be  a 
visual  problem,  because  of  the  resulting 
tremendous  flickering  of  the  whole  scene. 
As  the  human  eye  especially  perceives 
motion  a  continuous  change  of  object 
representations  disturbs  the  impression  of 
the  viewed  scene. 


Figure  2:  Arrangement  of  geo  primitives 


The  problem  in  adjusting  the  rendering 
modul  are  the  many  parameters  in  the 
abstraction  algorithms.  Tiie  question  is  how 
to  reach  the  optimum  between  high  frame 
rate  and  high  level  of  detail.  One  condition 
is  that  the  visual  output  is  done  for  a 
subjectivly  deciding  human  being.  Another 
is  the  contradiction  because  of  opposite 
goals  of  the  criteria.  For  example  an  object 
at  a  far  distance  is  not  very  important, 
although  the  same  object  has  a  higher 
importance  possibly  because  of  a  definition 
as  an  object  of  interest.  The  arising 
contradiction  can  be  balanced  by  using 
fuzzy  logic  algorithms. 


3.  A  DYNAMIC  ADJUSTABLE 
RENDERING  SYSTEM 

The  dynamic  control  of  the  graphical 
level-of-detail  has  various  advantages  and 
has  become  state  of  the  art  with  the 
introduction  of  the  Silicon  Graphics 
rendering  package  Performer.  The 
philosophy  is  to  use  graphical  abstractions 
for  some  scene  objects  in  a  stress  situation 
of  the  graphical  output  But  the  only  benefit 
of  the  SGI  Performer  is  to  decide  either  to 
give  the  best  (normal)  quality  or  to  simplify 
the  scene  by  leaving  out  some  details  of  the 
scene.  Different  representations  of  objects 
are  possible,  but  very  complicate  to  model. 

Performer  is  layed  out  for  peaks  of 
stress  situations.  The  disappearance  of 
objects  in  these  situations  cannot  be 
accepted  in  most  applications.  A  controller 
has  to  be  layed  out  to  control  the  scene 
permanently,  because  stress  situations  must 
be  expected  anytime.  A  permanent 
abstraction  of  the  scene  has  to  be  possible. 

This  can  be  reached  best  by  using  a 
fuzzy  logic  controlled  rendering  modul. 
With  the  use  of  a  controller,  a  frame  rate 
can  be  fixed.  The  controllers  task  is  to 
reach  the  desired  frequency  of  frames  per 
second  by  selecting  one  of  the  different 
representations  of  the  single  objects.  It  is 
the  user’s  job  to  give  the  system  the  best 
compromise  between  the  fastest  display 
frame  rate  and  the  best  quality  of  the 
output  to  tolerate.  This  adjustment  strongly 
depends  on  the  subjective  quality 
perception  of  the  viewer. 

The  successful  aproach  controlling 
principle  in  discrete  environments  is  the 
philosophy  of  fuzzy  logic.  All  incoming 
values  can  sinqily  be  rated.  All  factors  of 
the  output  decision  can  be  considered 
carefully.  The  fiizzy  logic  controller  gives  a 
soft  and  easy  to  control  switching  of  the 
representations  of  graphical  objects.  The 
fuzzy  logic  controler  gives  the  best  way  in 


rating  the  complexity  of  the  current  scene 
and  offers  a  soft  deviation  of  the  desired 
frequency.  The  problem  of  the  disturbing 
flickering  fiom  the  oscillation  in 
conventional  controllers  can  be  minimised. 

The  benefit  of  a  fuzzy  logic  controller  is 
the  very  easy  usage  of  input  values  by  using 
linguistic  variables.  These  variables  -  also 
called  membership  functions  -  describe  the 
state  of  every  influencing  parameter  in  form 
of  a  natural  language.  The  distance  to  an 
object  of  the  scene  for  example  can  be 
descibed  in  words  like  very  near,  near, 
middle,  far  or  very  far.  Every  condition  is 
represented  by  a  fuzzy  set.  They  define  the 
memberships  of  the  conditions  to  an 
interval  of  discrete  values.  If  e.g.  the 
current  distance  is  detected  by  the  system, 
the  fuzzy  algorithm  could  recognize  the 
fuzzy  set  far  to  80  percent  and  very  far  to 
40  percent.  All  other  parameters  like  size, 
complexity  etc.  are  evaluated  similar. 


Figure  3:  The  membership  function  of  the 
deciding  parameters 


All  decisions  of  any  parameter  are  brought 
together  by  rules.  In  these  rules  the 
connections  between  the  different 
parameters  (distance,  size  and  complexity 
of  objects  etc.)  of  the  scene  description  are 
evaluated  like: 

if  the  distance  is  far  and  the  size  is 
big  then  give  a  better  representation 

if  the  complexity  is  low  and  the  size  is 
small  then  give  a  more  abstract 
representation 


Any  combinations  of  these  rules  are 
possible.  The  valid  output  of  the  rules  can 
have  opposite  evaluations,  but  the 
important  advantage  of  fuzzy  logic 
controllers  is  the  possibility  to  compensate 
all  opposites  and  to  find  an  optimal  average 
in  evaluating  the  input  values. 

The  evaluation  of  the  complete  set  of 
rules  via  defuzzyfication  returns  a  discrete 
value,  that  the  rendering  modul  uses  to 
decide  which  representation  is  the  current 
best  choice.  The  more  rules  are  used  and 
the  more  variables  are  interpreted,  the 
better  is  the  stability  of  the  controller.  This 
stability  is  achieved  by  decreasing  the 
controllers  oscillation  and  the  minimising  of 
the  withcoming  flickering  of  the  graphical 
output 

So  far  it  is  easy  to  define  a  controller  to 
decide  the  representations  of  single  objects. 
But  the  task  is  not  to  optimise  the  output  of 
the  objects  in  a  scene,  but  the  impression  of 
the  scene  itself.  The  controller  must  be 
optimised  to  give  the  best  subjective  scene 
impression  to  the  user. 

The  missing  link  to  an  improved 
controlled  rendering  modul  is  the 
evaluation  of  the  complete  scene.  How  can 
a  complex  scene  be  evaluated?  The  limited 
output  ressources  determine  the  decisions 
of  the  controller  of  every  single  object  in 
the  current  scene.  The  evduation  of  objects 
have  to  be  seen  in  relation  to  the  output  of 
the  whole  scene.  This  means  that  the 
controllers  decisions  are  depending  not 
only  on  static  parameters  but  on  the 
connection  of  the  scene  building  objects. 

The  fuzzy  logic  controller  must  be 
optimised  by  a  training  in  well  known 
virtual  worlds.  The  decision  base  of  the 
controller  are  the  membership  functions 
and  the  rule  base.  The  rule  base  cannot  be 
changed  softly,  all  connections  are  fix.  Size 
and  the  form  of  the  fuzzy  sets  in  the 
membership  functions  can  be  manipulated 


though..  The  sets  are  defined  as  trapeziod 
shaped  memberships,  built  up  by  a 
quadrupel  The  four  coordinates  defining 
the  sets  can  be  manipulated  to  get  a 
modified  characteristic  of  the  controller. 

How  can  the  controller  be  adjusted  to 
detect  the  best  decision  of  object 
representatives  with  a  minimum  of 
oscillating  but  holding  a  fixed  fiame  rate 
and  an  optimal  subjective  scene  display? 
With  the  modification  of  all  defining  points 
of  fuzzy  sets  in  the  five  incoming 
membership  functions  36  parameters  can  be 
found  to  modify.  The  best  output  depends 
on  an  optimum  rating  of  these  parameters. 

4.  OPTIMISATION  BY  GENETIC 
ALGORITHMS 

Genetic  algorithms  (GA)  have  some 
properties  that  make  them  interesting  as  a 
technique  for  selecting  high-performance 
membership  functions  for  fuzzy  logic 
controllers.  Due  to  these  properties,  GAs 
differ  fundamentally  from  more 
conventional  search  techniques.  They 
consider  a  population  of  points  and  not  a 
single  one,  they  use  non-deterministic 
probabilistic  rules  to  guide  their  search. 
GAs  consider  many  points  firom  the  search 
space  simultaneously  and  therefore  they 
have  a  reduced  chance  of  converting  to  a 
local  optima  [4]. 

GAs  work  on  the  basis  of  the 
evolutionary  theory  [5].  The  parameter  set 
defining  the  problem  to  optimise  -  in  the 
special  case  the  defining  points  of  the 
membership  functions  -  are  seen  as  a  DNA 
of  an  biological  individual. 

The  result  of  the  search  procedure  is  the 
definition  of  the  optimal  parameter  set.  This 
set  can  be  found  by  starting  the  evolution 
with  a  large  number  of  individuals  with  an 
identical  parameter  set  (DNA).  The  DNA 
of  individuals  can  be  inherited.  During  the 


inheritage  the  DNA  of  the  individuals  is 
modified  by  crossing  or  mutation.  Crossing 
means  the  changing  of  random  parts  of  the 
parameter  set  between  two  individuals, 
mutation  is  a  mechanism  that  exchanges 
single  parameters  within  the  set  of  an 
individual.  The  descendants  generated  by 
these  algorithms  build  up  the  new 
population. 

The  generated  new  individuals  are  tested 
iteration  by  iteration.  The  biological  rule  of 
the  survival  of  the  fittest  is  applied.  Every 
new  individual  (parameter  set  for  the 
solution  of  the  optimisation  problem)  is 
rated  in  a  typical  environment 

The  duration  of  these  tests  is  to  long  to 
calculate  thousands  of  iterations.  Instead  of 
this  a  average  function  of  typical  rendering 
outputs  was  created.  This  function 
simulates  the  typical  characteristic  of  the 
rendering  system,  the  rating  of  the  single 
individuals  can  be  calculated  without 
graphical  output.  If  the  test  has  a  positive 
result  the  current  parameter  set  is  a  good 
solution  for  the  membership  functions.  The 
individual  is  strong  enough  and  keeps  alife, 
otherwise  the  individual  dies. 

The  best  parameter  sets  for  the 
membership  functions  can  be  found  by 
following  only  the  fittest  individuals.  Like 
in  the  evolution  only  the  fittest  individuals 
inherit  their  DNA.  After  a  while  the  fittest 
individuals  are  selected  and  every 
parameter  set  will  deliver  a  good  definition 
of  the  considered  membership  function. 

Genetic  algorithms  are  not  random 
walks  through  the  search  space.  They  use 
random  choice  efficiently  in  then- 
exploitation  of  prior  knowledge  to  locate 

near-optimal  solutions  rapidly. 

An  example  of  the  successful  work  of 
the  manipulation  of  the  membership 


functions  can  be  seen  in  the  Figure  3  and 
Figure  4. 


The  membership  function  for  the 
distance  to  the  current  object  is  defined  as 
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Figure  4:  Optimised  membership  function 


fu2zy  sets  with  the  conditions  very  near, 
near,  middle,  far  and  very  far.  The 
predefined  membership  function  are  defined 
symetrically  to  give  the  best  assumption  for 
the  optimising  genetic  algorithm.  The  result 
of  the  optimisation  can  be  seen  in  Figure  4. 

The  defined  fuzzy  sets  stiU  exists,  the 
difference  to  the  predefined  membership 
function  is  the  modified  placement  and 
shape  of  the  single  sets  related  to  the 
complete  function. 

The  rating  algorithm  is  a  simulated  walk 
through  a  virtual  world.  A  nonstatic  change 
of  visible  objects  is  simulated  by  a  step 
function  response  of  the  controller.  In 
Figure  5  can  be  seen  that  the  automatically 
optimised  controller  delivers  a  mininnnm 
oscillation  and  a  very  fast  adjusting  to  the 
desired  frame  rate. 

The  rating  algorithms  are  so  fast  that  a 
learning  on  the  fly  can  be  reached.  A 
permanent  optimising  in  running 
simulations  is  possible.  Li  every  new  virtual 
world  the  membership  functions 
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Figure  5:  Frame  rate  of  the  controlled 
rendering  modul 

can  be  optimised.  An  individual  controller 
parameter  definition  of  the  controller  for 
every  new  scene  can  be  designed  [6]  [7]. 
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5.  VIRTUAL  REALITY  AT  THE  BPA 

The  IPA  is  at  the  forefront  of  Virtual 
Reality  research  in  Germany  since  1991. 
First  industrial  projects  have  been  done  in 
the  field  of  manufacturing  engineering  [8]. 
Figure  6  shows  an  overview  of  the  use  of 
Virtual  Reality  system  at  the  IPA.  There 
are  industrial  projects  in  the  planning  of 
robot  work  cells,  visualisation  of  CAD-data 
and  presentation  of  product  ideas.  Further 
applications  have  been  realised  like  off-line 
programming  and  teleoperation.  Rapid 
Prototyping,  manufacturing  planning, 
assembly  planning,  operator  systems  and 
training  systems  are  areas  for  further 
development 

6.  THE  DEMONSTRATION  CENTRE 
VIRTUAL  REALITY  AT  THE  IPA 

The  high  interest  from  the  public  and 
the  industrial  side  was  accorded  to  further 
Fraunhofer-Institutes  which  deal  with  this 
innovative  technology.  The  Fraunhofer- 
Society  has  faced  the  challenge  of  this  new 
technology  and  is  now  active  in  making  this 
potential  accessible  to  industry.  The  project 
is  to  be  efficiently  supported  by  the 
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Figure  6:  Virtual  Reality  Applications  at  the  IPA 


Demonstration  Centre  for  Virtual  Reality. 
In  January  1993  this  institution  has  taken 
up  its  work  for  the  period  of  five  years  [9]. 

The  establishment  of  the  Fraunhofer- 
Institute  Demonstration  Centre  for  Virtual 
Reality  is  to  present  smaller  and  middle- 
sized  companies  new  techniques  of  Virtual 
Reality,  to  reduce  fear  of  contact  between 
industrial  practice  and  Virtual  Reality 
research  and  to  demonstrate  prototypes  of 
Virtual  Reality  applications  in  a  practical 
and  vivid  way. 

The  following  services  are  available  in  teh 
Demonstration  Centre  for  Virtual  Reality  at 
the  IPA  in  Stuttgart: 

•  Dispersion  of  availabel  knowledge  on 
Virtual  Reality, 


•  Training  of  personnel  from  interested 
companies, 

•  Presentation  of  demonstration 
appliances  and  processes, 

•  Consultancy  for  companies, 

•  Application  and  test  of  new  and 

already  available  applications  and 

•  Development  and  demonstration  of 
exemplary  applications. 

7.  CONCLUSION 

Virtual  Reality  technology  makes  a 
valuable  contribution  to  improve  simulation 
and  control  tools.  The  operator,  using  a 
dataglove  and  a  head-mounted  stereo 
display  can  act  in  a  virtual  world  and  is  no 
longer  a  passive  observer. 

To  get  the  best  response  from  the 
Virtual  Reality  system,  the  graphical  output 
must  be  efficiently  high.  Best  usage  of 
advanced  rendering  functionality  (possible 
to  combine  with  commercial  products  like 
Performer  (SGI)  or  other  high  performance 
rendering  systems)  are  reached  by  new 
developed  control  moduls. 

As  high  performance  can  now  be  achived 
even  with  inexpensive  hardware  advanced 
rendering  tools  for  VR  systems  can  now  be 
used  for  industrial  applications. 
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Abstract:  In  this  paper  we  present  a  new  virtual  reality  applications  design  platform,  VIPER,  which 
emphasizes  distributed  aspects  of  virtual  environments.  Our  goal  is  to  propose  the  developer  a  generic 
system  which  can  be  specialized  to  suit  an  application,  while  minimizing  his  work  all  along  the  virtual 
environment  definition.  We  describe  the  general  structure  of  an  environment  in  VIPER,  and  discuss  the 
distribution  of  such  an  environment. 

Kepvords:  virtual  environments,  application  design  platform,  distribution,  programming  environment. 

L  Introduction 

In  order  to  increase  the  realism  of  virtual  environments,  one  solution  is  to  allow  many  users  as  well  as 
simulated  entities  to  interact  in  a  shared  virtual  environment.  Such  a  virtual  environment  appears  more 
real  from  each  user  viewpoint  thanks  to  the  rest  of  the  environment  liveliness.  Applications  and  worlds 
in  these  environments  are  difficult  to  design  and  simulate  mainly  because  of  their  distributed  properties. 

Our  goal  is  to  define  a  generic  system  for  virtual  reality  application  designing  and  virtual  environment 
simulation.  This  system,  called  VIPER,  has  to  simulate  multiparticipant  virtual  worlds  in  real-time, 
thanks  to  a  distributed  platform. 

In  this  paper  we  present  the  general  structure  of  a  virtual  environment  in  VIPER  as  well  as  the 
framework  of  the  system  software  architecture. 

2.  Survey  of  existing  similar  systems 

Virtual  environments  operating  systems  can  be  roughly  divided  into  two  classes. 

The  first  class  is  composed  of  systems  dedicated  to  a  certain  type  of  application.  Good  examples  of  this 
class  are:  NPSNET  [Zyda93]  which  is  dedicated  to  military  simulations  and  IPA's  Transputer-based 
virtual  reality  workstation  [Strommer93]  which  is  dedicated  to  industrial  robot  control  and 
programming.  Those  systems  have  developed  domain  specific  accelerating  techniques:  specific  hardware 
(dedicated  Transputer  architecture  of  IP  As  VR  workstation)  and  optimization  algorithms  (dead 
reckoning  for  NPSNET  like  systems).  Those  systems  are  very  efficient  for  their  specific  domain  but 
seem  difficult  to  reuse  for  other  applications. 

Application  generic  systems  compose  the  second  class.  Good  examples  of  this  class  are:  DIVE 
[Carlsson93],  VEOS  [Bricken93]  and  AVIARY  [West92].  Those  systems  have  to  be  the  most  generic 
possible  in  order  to  be  easily  used  to  design  any  application.  This  genericity,  however,  trades  off  with 
the  general  efficiency  of  the  final  application.  For  example  in  most  of  these  systems,  optimization 
algorithms  are  difficult  to  implement. 

In  order  to  better  deal  with  efficiency  within  a  generic  system,  we  think  that  tools  should  be  proposed  to 
the  virtual  environment  developer.  Systems  like  WAVES  [Kazman93a],  have  tried  to  implement 
optimizations  in  the  core  of  the  system.  We  rather  think  that  the  definition  of  such  optimizations  should 
be  available  for  the  virtual  environment  developer,  either  declaratively  (the  developer  chooses  an 


available  optimization  for  his  current  environment)  or  explicitly  (the  developer  implements  a  new 
acceleration,  eventually  based  on  an  existent  one). 

3.  Virtual  environment  structure 

aimed  for  Ae  design  of  every  application  that  is  based  on  a  virtual  environment  which  can  be 
modelled  using  the  followmg  general  structure. 

In  VIPER,  a  virtual  environment  is  made  of  entities,  stimuli  and  a  virtual  universe  (Fig.  3.1). 

The  "ennty"  paradigm  allows  uniform  management  of  virtual  worlds  scenery,  virtual  objects  and 
clones  .  Entities  are  autonomous  and  own  a  set  of  attributes  and  behaviours.  They  are 
conceptually  grouped  in  Jamilies"  (a  set  of  entities  which  own  the  same  attributes  and 
behaviours).  Our  system  is  best  suited  for  homogeneous  virtual  environments  (few  families  made 
of  many  instances). 

•  The  "stimuli"  convey  interactions  behveen  entities. 

•  The  virtual  universe"  is  the  three  dimensional  environment  where  entities  interact  and  behave. 


Virtual  universe  w 

Figure  3.1;  Virtual  environment  structure 

A  clone"  is  a  special  type  of  entity  which  behaves  as  an  interface  between  a  user  rMouli931  an 
app  ication  [West92]  [Carlsson93]  or  a  robot  and  the  virtual  universe.  An  example  of  an  application 
clone  can  be  a  m^eller  ^ch  as  DeM^ons  [Gandrat93]  a  declarative  multimodal  modeller  which  is  to  be 
m  egrated  with  VIPER.  TTie  modeller  entity  will  be  able  to  provide  some  services  to  a  user  (in  fact  to  a 
users  clone)  ui  order  to  build  a  virtual  world. 

The  puipose  of  this  structure  is  to  ease  the  definition  of  distribution  schemes.  Autonomous  entities  leads 
to  a  perfect  encapsulation  of  the  behaviour  and  state  of  an  entity,  and  therefore  ease  distribution  of 
entities^  such  an  entity  can  execute  its  behaviour  on  a  specific  site,  communicating  with  other  entities 
through  well  defined  Stimuli. 

fa  this  part,  fa  order  to  describe  the  structure  of  entities,  we  will  present  the  inter-entity  communication 
mechanism  and  the  behavioural  part  of  entities. 

Entities  and  their  environment  interact  in  two  main  ways: 

-  from  the  environment  to  entities:  a  change  in  the  environment  produces  a  stimulus  which  is 
perceived  by  some  entities.  A  stimulus  is  received  by  a  specific  sensor  which  interprets  and  then 
transmits  information  to  the  behavioural  part  of  an  entity. 

^  entity  to  the  environment:  once  the  information  has  been  analysed,  the  behavioural 

♦  •!!  internal  state  of  the  entity  and  if  necessary  commands  actions  to  the  effectors 

0  me  entity.  Effectors  act  on  the  environment  and  while  modifying  it  create  new  stimuli. 

interactions  with  four  elements:  sensors,  effectors,  images  and  image  spaces 

relf  Jo  M  environment  internal  interactions  as  well  as  those  happening  with  the 

r  g-  or  ^  0/7^  -  user  communication,  devices  such  as  datagloves  and  tracking  systems  are 
encapsulated  in  sensors). 


To  each  t\pe  of  stimuli,  we  associate  a  quadruplet:  (image,  image  space,  sensor,  effector).  An  image  is 
the  perceptible  part  of  an  entity  in  the  environment,  in  relation  to  a  specific  type  of  stimuli  (e.g.  a  3D 
shape  is  the  perceptible  part  of  an  entity  in  relation  to  visual  stimuli).  An  image  space  is  mainly  the  set 
of  images  related  to  a  certain  type  of  stimuli.  A  sensor  gets  a  "snapshot"  of  an  image  space  which  it 
filtrates  and  interprets.  While  executing  an  action  an  effector  produces  an  image  and  then  puts  it  into  an 
image  space.  In  fact,  to  each  effector  is  first  associated  a  unique  image  in  the  image  space  which  is 
modified  afterwards. 


Figure  3.2:  Inter-entity  communication  mechanism 


Introducing  an  image  space,  which  acts  as  a  mediator  in  the  inter-entity  communication,  is  very  useful  in 
order  to  distribute  a  virtual  environment  (as  will  be  explained  in  section  4.3). 

The  behavioural  part  of  an  entity  consists  in  two  components  (Fig.  3.3):  a  **reflex**  component,  where 
entity  reflexes  to  stimuli  are  defined,  and  a  thinking'*  component,  where  more  complex  behaviours  can 
be  defined.  Behaviours  can  be  implemented  by  finite  state  machines  [Carlsson93],  sensors/actuators 
networks  [Wilhelms90]  [Panne93],  Prolog  like  rules  [Rainjonneau90],  intention  generators 
[Xiaoyuan94],  classifier  systems  [Torguet95]  as  well  as  interfaces  to  users,  applications  or  robots. 

The  reflex  component  is  made  of  a  number  of  modules.  Each  module  is  assigned  to  a  sensor  in  order  to 
analyse  its  results  and  if  necessary  react,  commanding  actions  to  effectors  and/or  modifying  its  own 
state.  The  thinking  component  is  triggered  either  by  a  clock  or  by  any  reflex  module.  This  component 
defines  the  active  side  of  the  behaviour  while  the  reflex  component  defines  the  reactive  side. 


The  entity  has  also  its  own  state:  a  set  of  attributes  (position,  velocity,  orientation,  mass,  3D  shape...) 
with  a  set  of  personal  constraints  (constraints  which  act  on  the  entity  attributes,  e.g.: 
velocity  <  maximiim^yelocity). 


4.  Software  architecture  of  VIPER 

The  software  architecture  of  VIPER  consists  in  four  layers  (Fig.  4.1): 

•  At  the  bottom,  the  distributed  platform,  which  is  composed  of  a  set  of  sites  (workstations, 
multicomputers...),  creates  a  virtual  machine  based  on  a  message  passing  system  like  PVM 
[Sunderam90].  This  layer  encapsulates  the  communication  system  used  by  VIPER. 

•  Above  this  layer,  an  object-oriented  concurrent  programming  environment  encapsulates  data 
distribution  and  an  SPMD  (single  program  multiple  data)  model  [Moisan93]. 

•  The  third  layer  is  the  entity  model,  already  described  in  the  third  section  of  this  document. 

•  The  topmost  layer  allows  the  definition  of  specific  virtual  worlds  and  the  choice  of  their 
distribution  scheme. 


Figure  4.1;  The  software  architecture  of  VIPER 

4.1.  The  distributed  platform 

^e  distributed  platform  is  the  set  of  sites  which  take  part  in  the  simulation  of  the  virtual  environment.  It 
is  logically  divided  into  four  sets  of  resources:  computing,  rendering,  storage  and  input/output  resources. 
Each  site  is  at  least  referenced  into  one  of  these  sets.  For  example,  a  graphics  workstation  is  referenced 
in  all  those  sets. 

Rendering  resources  are  the  set  of  sites  which  have  render  specific  hardware.  A  graphics  board  and  a 
three-dimensional  sound  rendering  board  are  good  examples  of  rendering  resources.  These  resources  are 
usually  connected  to  sensorial  devices  such  as  head  mounted  displays  or  headphones.  This  set  is  also 
divided  into  subsets  of  similar  rendering  resources. 

hput/output  resources  are  the  sites  which  have  input/output  ports  used  by  VIPER.  This  set  is  divided 
into  three  groups:  acquisition  resources  which  allow  access  to  virtual  reality  devices  (such  as  datagloves 
and  tracking  systems),  standard  input/output  resources  (console)  and  operating  resources  which  operate 
external  systems  (such  has  robot  control  boards).  Acquisition  and  operating  resources  are  further 
divided  into  subsets  of  similar  devices. 

4.2,  Distributing  an  entity  over  the  platform 

In  order  to  be  easily  distributed  over  the  platform,  each  entity  is  divided  into  a  number  of  parts  called 
"sub-entities”  (Fig.  4.2).  These  sub-entities  are:  rendering,  input/output,  behavioural  and  general  sub¬ 
entities. 

Each  rendering  sub-entity  is  composed  of  render  specific  sensors  and  attributes.  Each  input/output 
sub-entity  is  composed  of  acquisition  sensors  or  operating  effectors  and  attributes  that  are  specific  to  a 
t3q3e  of  input/output.  The  storage  sub-entity  is  made  of  all  storage  effectors  and  storage  specific 
attributes.  The  behavioural  sub-entity  is  composed  of  all  the  other  sensors  and  effectors,  reflexes. 


thinking,  personal  constraints  and  behavioural  specific  attributes.  The  general  sub-entity  is  made  of  the 
set  of  attributes  which  are  not  specific  to  one  of  the  other  sub-entities. 


General 

Render 

I/O 

Behavioural 

Render 

I/O 

Render 

I/O 

Storage 

Figure  4.2:  An  entity  divided  into  sub-entities 

Each  rendering  sub-entity  is  given  to  one  of  the  matching  rendering  resources.  Each  input/output  sub¬ 
entity  is  similarly  given  to  a  matching  resource.  The  behavioural  sub-entity  is  given  to  one  of  the 
computing  resources.  The  storage  sub-entity  is  given  to  one  of  the  storage  resources.  The  general  sub¬ 
entity  is  duplicated  over  all  resources  where  the  entity  has  a  sub-entity.  Practically,  all  entity  resources 
ought  to  refer  to  the  same  site  (e.g.  a  multiprocessor  graphics  workstation)  or  a  few  tightly  connected 
sites.  And  the  system  always  tries  to  get  this  property  verified, 

4.3.  Distributing  Virtual  Worlds 

This  distribution  of  an  entity  introduces  a  functional  parallelism  which  is  internal  to  the  entity.  A  more 
general  parallelism  within  the  virtual  universe  can  be  achieved.  A  virtual  universe  is  a  homogeneous 
environment  inhabited  by  entities  which  behave  in  a  similar  way. 

Object  parallelism  has  been  mainly  used  in  existing  systems  because  of  its  simplicity  [Kazman93a] 
[Snowdon93].  The  main  advantage  of  this  parallelism  is  to  maintain  a  good  load  balancing.  Obviously  if 
each  site  is  given  the  same  load  of  entities  each  one  would  become  equally  loaded.  Moreover,  a  strict 
encapsulation  of  entities  allows  the  introduction  of  dynamic  load  balancing  [Kazman93b]. 

However,  in  order  to  detect  and  manage  direct  or  indirect  interactions  (collisions  or  information 
exchange  in  the  environment),  communication  between  sites  introduces  a  great  number  of  problems 
(bottle-necks  on  an  interaction  specific  site  for  example). 


VIPER  proposes  to  solve  these  problems  with  distributed  image  spaces.  Communication 
entities  through  image  spaces  has  the  advantage  of  being  well  defined  and  easily  manageable. 

We  have  currently  defined  three  kind  of  distributed  image  spaces: 


between 


-  fully  distributed  image  spaces.  In  those  image  spaces,  images  are  only  situated  on  their  entity 
computing  resource  and  are  remotely  accessible.  For  e.xample  in  Figure  4.3,  when  each  entity  effector 
creates  an  image  of  the  entity  for  the  first  time,  each  image  exist  on  the  site  where  it  has  been  created. 
And  afterwards,  modification  of  these  images  are  only  locally  done.  The  entity  on  the  third  site  is  able  to 
get  every  images  present  in  the  image  space.  This  is  done  totally  transparently:  an  access  to  a  distant 
image  is  exactly  the  same  as  an  access  to  a  local  image  as  far  as  the  sensor  is  concerned.  The  only 


difference  being,  obviously,  access  time.  We  think  that  this  first  type  of  image  space  is  interesting  when 
there  are  few  sensor  doted  entities  which  retrieves  information  only  from  time  to  time,  whereas  effectors 
modify  images  very  often.  An  example  of  such  an  image  space  can  be  a  low  frequency  radar  image 
space,  where  entities  images  are  their  position  in  space  and  sensors  are  radar  which  examine  the  image 
space  with  a  low  frequency. 


Figure  4.4:  A  fully  duplicated  image  space 

-  fully  duplicated  image  spaces,  where  images  are  duplicated  over  all  computing  resources  and 
image  updates  are  propagated.  For  example  in  Figure  4.4,  when  each  effector  creates  an  image,  the 
image  is  created  on  every  site  the  image  space  is  accessible  from.  Each  modification  of  an  image  is 
thereafter  propagated  to  all  sites  via  update  messages  transparently  sent  by  the  image  space  itself 
When,  for  example,  the  sensor  of  the  third  site  entity  examines  the  image  space,  it  only  looks  at  local 
copies  of  each  mages.  This  type  of  image  space  is  very  interesting  for  image  space  where  there  is  a  lot 
of  sensors  sensing  iho  image  space  often.  An  example  of  such  an  image  space  is  the  one  used  to  mediate 
visual  communication  between  entities.  In  this  image  space,  entities  outputs  their  changes  in  appearance 
(such  as  a  position  modification  when  moving)  and  other  entities  sense  these  changes  in  order,  either  to 
modify  their  behaviour  or  to  display  a  view  of  the  3D  world  to  their  users  (for  user’s  clones  entities). 

-  smart  duplicated  image  spaces,  where  images  are  duplicated  over  computing  resources  which 
needs  Aem  (i.e.  if  there  is,  currently,  a  sensor  connected  to  the  image  space)  (Fig.  4.5).  This  is  a 
derivative  of  duplicated  image  spaces,  useful  when  there  are  few  sensors  (less  than  one  at  each  site).  An 
example  of  such  an  image  space  can  be  a  sound  mediating  image  space,  which  manages  distribution  of 
sounds  produced  by  users  or  virtual  objects  and  listened  by  users  which  have  sound  producing  boards. 


Figure  4.5:  A  smart  duplicated  image  space. 

However,  in  order  to  better  solve  some  of  the  previously  mentioned  problems  we  think  that  we  should 
take  into  account  that,  in  most  communication  types,  two  entities  may  only  interact  if  they  are  close 
enough.  This  can  be  done  by  associating  topological  information  to  image  spaces. 

A  logical  spatial  partition  of  image  spaces  seems  interesting  for  worlds  which  are  logically  divided  into 
rooms  linked  by  portals  (doors,  openings...).  Each  entity  is  associated  to  the  room  in  which  it  is  spatially 
present,  and  each  site  will  only  manage  interactions  which  involve  its  entities  (i.e.  needs  only  the  part  of 
each  image  space  which  is  local  to  its  entities  rooms).  For  example,  if  we  consider  a  virtual  building,  an 


entity  may  only  interact  with  entities  located  in  the  same  room  or  in  adjacent  rooms  (if  there  are  open 
doors). 


A  regular  space  partition  can  also  be  considered  [Zyda93].  In  this  case,  the  virtual  world  space  will  be 
divided  into  grid  squares  of  a  constant  size.  Such  a  built  grid  can  either  be  two  dimensional  (in  case  of 
applications  in  which  altitude  doesn’t  matter  much:  vehicle  simulation  for  example)  or  three  dimensional 
(applications  where  altitude  is  very  important:  flight  simulation  for  example).  Each  entity  is  associated 
to  the  grid  zone  in  which  it  is  spatially  present,  and  each  site  will  only  manage  interactions  which 
involve  its  entities  (i.e.  needs  only  local  parts  of  each  image  space).  An  example  of  the  use  of  topological 
information  is  presented  in  Figure  4.6.  In  this  figure,  the  plane  is  only  able  to  detect  two  vehicles 
because  the  image  space,  being  aware  of  plane  sensors  limitations,  only  take  into  account  a  part  of  the 
virtual  world  when  listening  to  vehicle  images  updates. 


Figure  4.6:  A  smart  duplicated  image  space  with  topological  information. 

An  a  priori  choice  for  a  distribution  model  is  difficult  to  make.  In  fact  this  choice  is  really  application 
dependent.  For  example,  for  a  flight  simulation  a  regular  image  space  partition  seems  interesting,  while 
a  logical  space  partition  seems  more  useful  for  an  architectural  walk-through.  In  VIPER  the  application 
developer  is  able  to  choose  for  each  world  and  for  each  image  space  a  specific  distribution  scheme. 
Moreover,  the  developer  can  specialize  an  already  defined  image  space  in  order  to  better  suit  his  needs. 

Thus,  the  developer  can  quickly  prototype  his  application  using  existing  general  distributed  image 
spaces.  And  then,  in  order  to  increase  efficiency,  he  may  try  other  distributed  image  spaces  or  explicitly 
implement  new  optimizations  algorithm.  For  example,  the  developper  of  a  vehicle  simulation  application 
might  choose  a  smart  duplicated  image  space  to  mediate  visual  communication  between  entities 
(vehicles),  then  he  might  prefer  a  regularly  partitionned  image  space,  and  eventually  implement  dead 
reckoning  of  vehicles. 


5.  First  results  and  Conclusion 

A  first  implementation  of  VIPER  is  being  developed.  It  is  based  on  PVM  over  an  heterogeneous  network 
of  workstations  (SGIs,  HPs  and  SUNs).  The  choice  of  the  PVM  communication  system,  has  been 
dictated  in  order  to  rapidly  prototype  VIPER.  Indeed,  PVM  offers  interesting  tools  like  xpvm,  which 
ease  the  definition  of  parallel  applications  and  their  tuning.  However,  the  overhead  added  by  this 
communication  environment  doesn't  seem  to  match  large  scale  virtual  environments  requirements. 
TTianks  to  the  encapsulation  of  the  communication  system,  VIPER  can,  nevertheless,  be  easily  ported 
over  any  more  efficient  communication  system. 

VIPER  is  being  developed  using  the  C-H*  programming  language  which  offers  very  interesting  tools  in 
order  to  develop  generic  constructs.  C++  templates,  for  example,  allows  the  definition  of  generic 
distributed  image  space  which  can  be  parameterized  by  a  specific  class  of  image  (and  eventually 
derived),  in  order  to  suit  application  needs. 


The  entity  model,  fully  distributed  image  spaces  and  fiilly  duplicated  image  spaces  have  already  been 
implemented  and  are  being  benchmarked.  Now  we  are  developing  our  first  application:  a  modeller  which 
gives  users  the  ability  to  create  new  shapes  while  deforming  simple  ones.  This  first  application  will 
allow  us  to  estimate  the  efficiency  of  VIPER  when  confronted  to  a  real  problem. 

This  first  implementation  will  then  be  extended  with  smart  image  spaces  and  virtual  worlds  topological 
information.  The  integration  of  a  multicomputer  in  the  platform  is  also  being  studied. 
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Abstract 

We  have  developped  two  calculation  method 
for  the  manipulation  in  virtual  environments. 
Impetus  Method  is  based  on  only  one  phenom¬ 
ena,  collision  between  the  fingertip  and  the 
object.  We  start  with  the  simple  manipulation 
to  push  the  object  with  one  degrees  of  freedom 
and  expand  it  to  treat  with  more  complicated 
phenomena.  RSPM  realizes  the  detailed  ma¬ 
nipulation,  By  using  this,  the  object  can  be 
grasped,  manipulated  by  3  fingertips. 

1:  Introduction 

Natural  and  realistic  object  manipulations  in  virtual  envi¬ 
ronments  (VE)  are  important  because  the  true  application  nec¬ 
essarily  contains  the  object  manipulation[8].  Moreover,  the 
operator  can  recognize  the  law  that  dominates  the  behavior 
through  the  act  of  manipulation  (active  presence)  in  addition 


to  the  (passive)  presence  through  the  visual  sense  (Figure  1). 
Untill  now,  the  realized  manipulations  have  been  generally 
based  on  gestures,  and  these  manitulations  seem  to  be  very 
simple,  symbolic,  and  not  realistic. 

The  manipulation  is  realized  by  calculating  the  behavior  of 
the  object  driven  by  a  virtual  hand.  The  behavior  of  the  vir¬ 
tual  object  is  an  artifact.  We  can  define  any  behavior.  It  ap¬ 
pears  that  we  can  easily  generate  the  behavior  similar  to  that 
in  the  real  world  by  calculations  based  on  physical  laws.  There 
are,  however,  cases  in  which  physical  laws  cannot  be  applied, 
especially  in  VE  without  force-feedback. 

There  have  been  many  attempts  and  inventions  for  easier 
manipulation.  Although  some  of  them  work  well  and  effec¬ 
tively,  there  is  a  difficulty  to  combine  different  manipulation 
algorithms  according  to  the  various  situations  where  the  ob¬ 
ject  locates  in.  What  seems  to  be  lacking  is  to  explore  the 
nature  of  difficulties  in  such  calculations,  and  to  systematize 
the  methodology. 
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Figure  1  Realistic  Manipulation  Causes  Active  Presence 
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The  goal  of  this  study  is  to  develop  effective  manipulation 
algorithms,  and  to  make  clear  how  we  should  manage  differ¬ 
ent  manipulation  calculations.  Although  this  paper  does  not 
fully  acheive  this  goal,  two  example  manipulation  calcula¬ 
tions  with  fingertips  in  VE  without  force  feedback,  are  devel¬ 
oped,  and  several  discussions  are  shown  based  on  this  devel¬ 
opment  to  generalize  the  behavior  calculation. 


First,  authors  discuss  the  calculation  for  object  manipula¬ 
tion  generally. 

Secondary,  the  "Impetus  method"  is  described.  It  is  based 
on  a  simple  phenomenon,  the  collision  between  fingertip  and 
object  surface.  We  start  with  simple  movements  with  1  de¬ 
gree  of  freedom.  This  is  naturally  expanded  to  represent  move¬ 
ment  with  2/3  degrees  of  freedom,  static  and  dynamic  fric¬ 
tion,  stiffness,  rotational  moment,  and  restriction  on  a  sur¬ 
face.  This  method  belongs  to  the  category  of  semi-dynamics. 

Third,  a  "representative  spherical  plane  method"  for  grasp¬ 
ing  and  manipulating  object  by  3  fingertips,  is  described.  It  is 
based  on  the  restricted  interaction  among  three  fingertips  and 
a  sphere  that  represent  the  object.  This  method  enables  de¬ 
tailed,  fine  manipulation  by  3  fingertips.  This  belongs  to  the 
region  of  kinematics. 

At  last,  the  discussion  of  the  above  methods  and  future  work 
is  described. 


2.1  Problem  and  Assumptions 


In  the  case  of  VE  without  force  feedback,  the  position/ori¬ 
entation  of  the  hand  or  fingertip  are  monitored  by  sensor(s), 
and  this  data  forms  a  virtual  hand  that  interacts  with  the  ob¬ 
ject.  Namely,  the  hand’s  position  as  input  is  used  to  drives  the 
object,  and  there  is  no  output  to  the  real  hand.  The  data  flow 
between  the  real  hand  and  the  virtual  object  is  basically  only 
one  way. 

In  the  case  of  VE  with  force  feedback,  most  general  method 
is,  to  sense  the  position  of  the  hand,  to  calculate  the  interac¬ 
tion  based  on  the  position,  and  to  display  the  impedance,  force 
or  the  relation  between  position,  velocity  and  force.  Never¬ 
theless,  the  data  flow  between  the  real  hand  and  the  virtual 
object  is  two  way  (Figure.  2.1). 


Position 
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Without  Force  Feedback 


With  Force  Feedback 


Real  Hand  Virtual  Hand  Virtual  Object 

Figure  2.1  Dataflow 


What  is  important  here  is:  (Figure  2.2) 


2:  Generation  of  Object  Behavior 

The  calculation  of  the  interaction  or  of  the  object's  behav¬ 
ior  in  VE  has  different  characteristics  from  those  in  popular 
Graphical  Usr  Interface  (GUI)  such  as  drag,  pull-down,  etc,. 
In  the  case  of  VE,  it  is  more  complicated.  Generally,  it  has  3 
degrees  of  freedom,  and  utilizes  multiple  contact  point  (fin¬ 
gertips,  etc.).  The  number  of  states  of  the  manipulated  object 
are  larger  in  VE  than  in  GUI's.  The  largest  difference  is  the 
detection  to  which  model  the  behavior  of  the  object  belongs. 
In  the  case  of  VE,  it  should  be  based  on  geometrical  informa¬ 
tion  while  those  in  GUI's  are  mainly  based  on  symbol  infor¬ 
mation  such  as  a  mouse  click. 

Symbol  information  simplifies  the  detection  in  exchange 
for  giving  up  some  detailed  behavior.  For  example,  in  a  VE, 
if  the  gesture  of  a  hand  (symbol  information)  mainly  decides 
whether  the  hand  grasps  the  object  or  not,  the  object  is  grasped 
without  considering  the  geometrical  relation  between  the  fin¬ 
gertips  and  the  surface  of  the  object.  These  calculations  based 
on  symbol  information  are  not  suitable  to  generate  detailed 
behavior. 


1:  Only  the  position  is  obtained  as  input  from  the  real 
world 

This  means  we  cannot  include  the  term  of  force  in  the  cal¬ 
culation.  If  we  introduce  some  assumption  that  defines  the 
relationship  between  the  force  and  the  situation  (position,  ve¬ 
locity  and  the  other  term  that  has  been  obtained),  the  force 
can  be  achieved.  Because  the  assumption  dominates  the  all  of 
the  calculations  which  are  to  follow,  it  should  be  chosen  very 
carefully  [6]. 

2:  There  is  no  output  from  the  virtual  world  to  the  real 
hand 

This  means  the  movement  of  the  real  hand  cannot  be  re¬ 
stricted,  or  cannot  be  modified  by  the  reaction  force  from  the 
virtual  object. 

Therefore,  the  position  of  the  real  hand  could  easily  be¬ 
come  different  from  that  of  the  virtual  hand.  This  is  a  serious 
problem  because  the  calculation  based  on  the  position  data 
implicitly  assumes  that  it  is  based  on  the  actual  position  of 
the  real  hand.  To  solve  this  problem,  an  assumption  that  de¬ 
fines  the  relationship  between  the  real  hand  and  virtual  hand 
is  needed.  Also  this  assumption  should  be  chosen  very  care¬ 
fully  [6]. 
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fingertip  locate? 


Figure  2.2  Calculation  Problems  in  VE  without  Force  Feedback 


2.2  Kinematics,  Dynamics 

As  an  example,  let  us  think  about  the  case  where  a  solid 
object  is  grasped  by  three  fingers.  This  activity  can  be  di¬ 
vided  into  three  phases  (Figure  2.3). 

(a) :  Free 

No  finger  comes  into  contact  with  the  surface  of  the  object. 
The  behavior  of  the  object  can  be  calculated  based  on  physi¬ 
cal  laws  such  as  Newtonian  Physics.  If  the  world  contains 
gravity,  the  object  will  fall  freely.  If  the  world  contains  vis¬ 
cous  resistance,  the  velocity  of  the  object  will  decrease  gradu¬ 
ally.  Because  the  term  of  force  appearing  in  the  calculation  is 
not  the  force  between  finger  and  object,  there  is  no  problem. 
This  belongs  to  the  category  of  dynamics. 

(b) :  Restrictive 

If  the  fingers  grasp  the  object  tightly,  the  position  and  ori¬ 
entation  of  the  object  have  a  strong  relation  to  the  positions 
of  the  fingertips.  This  can  be  calculated  based  on  Kinematics. 
Kinematics  as  part  of  Mechanics,  can  deal  with  restrictive 
phenomena,  such  as,  a  pair  of  gears  that  engage  each  other,  a 
set  of  arms  that  are  connected  with  links,  etc.  The  behavior  of 
a  grasped  object  is  similar  to  such  phenomena.  Kinematics 
does  not  utilize  the  term  of  force.  Therefore  this  belongs  to 


the  category  of  Kinematics-like  methods.  Section  4  corre¬ 
sponds  to  this  region.  Kinematics-like  methods  simplify  the 
phenomena,  avoiding  the  term  of  force,  the  need  of  wider 
bandwidth  (higher  calculation  and  sensing  cycle)[5][7]. 

I 

(c):  Boundary 

When  some  of  the  fingers  are  very  close  to  the  surface  of 
the  object.  Kinematics  is  not  suitable  because  the  relation  be¬ 
tween  the  fingertips  and  the  object  will  change  rapidly.  Also 
Dynamics  as  a  part  of  Physics  is  not  suitable  because  there  is 
a  force  between  the  fingertip  and  the  object,  and  because  it 
suits  for  uniform  or  "flat”  phenomena.  Dynamics  as  part  of 
Mechanics  deals  with  the  term  of  force  explicitly,  but  it  is 
focused  on  several  mechanisms,  introducing  assumptions  to 
simplify  the  theory  and  the  calculation.  Roughly  saying,  this 
category  is  similar  to  that  of  Dynamics.  Section  3  corresponds 
to  semi-Dynamics. 

The  calculation  basing  on  symbol  information  such  as  the 
gesture  of  hand  neglects  this  category.  The  symbol  informa¬ 
tion  divides  the  state  of  the  object  compulsorily  into  (a)  and 
(b). 

Thus  with  force  feedback,  the  physical  model  for  this  cat¬ 
egory  is  not  perfect.  For  object  manipulations  by  robot-ma¬ 
nipulator,  there  are  several  hypotheses  and  theories. 


Figure  2.3  State  of  Object  and  Applicable  Methods 
3 


calculation  method 
(behavior  model) 

based  on  dynamics 
based  on  kinematics 


Figure  2.4  An  Example  of  a  Set  of  Behavioral  Laws  and  State  Transition 


2,3  State  Transition 

The  term  "situation"  can  be  defined  as  a  set  of  parameters 
that  describes  both  the  state  of  the  object  and  the  state  of  the 
surrounding  environment.  For  example,  let  us  suppose  that 
the  virtual  world  consists  of  a  virtual  hand  and  a  virtual  ob¬ 
ject.  The  state  of  the  object  is  a  set  of  parameters  such  as  the 
position  of  the  object,  the  size,  the  velocity,  etc.  Because  there 
is  only  the  hand  besides  the  object,  the  state  of  the  surround¬ 
ing  environment  is  a  set  of  parameters  such  as  the  position  of 
the  hand,  gesture,  etc,.  The  term  means  the  combination  of 
them.  All  the  parameters  form  a  configuration  space  that  is 
used  in  computational  kinematics  [4].  One  situation  is  a  point 
in  configuration  space. 

The  object  changes  its  behavior  over  these  categories,  and 
one  category  contains  multiple  regions  of  situation.  One  re¬ 
gion  corresponds  to  one  calculation  model  of  behavior. 

When  a  calculation  model  is  defined,  the  applicable  region 
in  the  configuration  space  is  restricted.  Therefore  the  detec¬ 
tion  of  which  region  the  object  belongs  to,  or  which  is  a  suit¬ 
able  calculation  model  for  the  object,  is  important  to  manage 
the  over  all  calculations  [3]. 

The  fragmentary  inventions  of  calculation  that  are  not  sys¬ 
tematized,  easily  declined  to  error.  The  minor  error  in  one 
model  of  behavior  sometimes  causes  unreasonable  state  tran¬ 
sition  and  changes  the  overall  behavior  largely. 

3:  Impetus  Method 


neighboring  regions  are  pushing  an  object  with  finger(s)  as 
shown  in  (Figure  3.1). 

We  start  with  defining  clearly  the  set  of  axioms  including 
the  fundamental  quantity  that  causes  the  movement  of  the 
object.  The  second  step  involuves  detecting  the  applicable 
region  from  the  axioms.  The  third  step  is  to  expand  this  to 
more  complicated  phenomena  systematically. 

In  other  words,  the  third  step  is  the  purpose,  which  is  de¬ 
pendent  on  the  second.  The  second  step  connects  the  state 
transitions  and  models  correctly,  which  need  to  be  clearly  de- 


Figure  3.1  Applicable  Phenomena  and 
Neighboring  Region  of  Impetus  method 


3.2  Axioms 


The  axioms  are  the  following: 


3.1  Basic  Concept 

This  method  aims  to  calculate  the  behavior  of  solid  objects, 
manipulated  with  fingertip(s).  Applicable  manipulations  and 
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Law  1:  "The  reason  for  the  change  of  motion  is  impulse 


force,  which  is  defined  as  the  invasion  vector  of  the  finger- 
tip'’[l].  This  needs  the  following  assumptions  to  enable  cal¬ 
culation: 

Assumption  1:  "Reaction  force  from  the  object  to  the  fin¬ 
ger  is  negligible”.  This  requires  several  secondary  assump¬ 
tions  such  as  the  mass  of  object  is  zero,  etc. 

Assumption  2:  "Any  element  cannot  invade  into  the  ob¬ 
ject"  [2].  Not  only  at  the  time  of  the  calculation  point  on  the 
time  axis,  but  also  at  the  time  between  the  calculation  point, 
finger,  other  objects  cannot  invade  into  the  inside  of  the  ob¬ 
ject  being  manipulated. 

33  The  calculation  and  the  detection  of  the  bound¬ 
ary  of  applicable  region 

Figure  3.2  shows  a  basic  calculation  where  one  fingertip  is 
pushing  the  surface  of  an  object.  Here,  all  the  movements 
have  only  one  degree  of  freedom. 

At  the  time-index  =  n,  the  position  of  the  fingertip  is  out¬ 
side  the  object.  At  the  next  time  segment  of  calculation,  the 
fingertip  position  (gotten  from  sensor)  invades  into  the  ob¬ 
ject.  Between  time  n  and  n+1,  a  collision  occurrs.  Using  lin¬ 
ear  interpolation,  the  correct  collision  time  and  the  trajectory 
of  the  fingertip  and  the  object  are  calculated.  Namely,  the 
object  position  at  n+1  is  calculated  from  the  fingertip  posi¬ 
tion  at  n,  n+1,  object  position  and  velocity  at  n,  using  simple 
interpolation. 


Movement 


Po'[n+l]  =  Po[n]  + Vo[n]*dt 
I  =  Po’[n+l]-Pf[n+l] 

Po[n+l]  =  Po’[n+l]  +  I 

As  the  invasion  vector  increases/decreases,  the  distance  from 
fingertip  to  the  surface  of  the  object  increases/decreases  re¬ 
spectively.  The  invasion  vector  acts  similarly  as  an  impetus 
or  impulse  force. 

The  impulse  from  the  fingertip  is  modified  systematically 
by  attributes  such  as  collision  factor,  mass/rotational  inertia 
ratio,  dynamic  friction  coefficient,  static  friction  limit,  restric¬ 
tion  on  the  surface,  etc.  It  must  be  noticed  that  these  (virtual) 
attributes  are  not  the  same  attributes  as  in  physics,  but  cause 
similar  effects  as  those  in  Physics.  Hence,  we  can  control  the 
properties  of  the  phenomena  easily. 

(Collision  factor) 

The  Impetus  I  in  the  equation  (1)  is  modified  by  collision 
factor  "e",  which  should  range  from  0.0  to  1.0. 

Po[n+l]  =  Po’[n+l]  +  e  *  I 

(Damping) 

Next,  a  damping  coefficient  d  is  introduced.  This  repre¬ 
sents  phenomena  such  as  the  friction  between  objects  being 
manipulated  and  their  surrounding  environment.  As  damp¬ 
ing  increase,  the  movement  of  object  decreases.  Impetus  I  in 
the  equation  (1)  is  modified  in  the  following  manner  by  intro¬ 
ducing  damping  factor  "d",  which  should  range  from  0.0  to 
1.0. 


PiPosition  f:  Finger  I:  Invasion  Vector 

ViVerosity  □  o:  Object  n:  Time  Index 


Figure  3.2  Calculation  Based  on  Invasion  Vector 


Po[n+l]  =  Po'[n+l]  +  e  *  d  *  I; 

(3D  movement) 

This  is  naturally  expanded  into  movement  with  3  degrees 
of  freedom. 

(Friction) 

Static  and  the  dynamic  friction  between  the  finger  and  the 
surface  of  the  object  are  introduced.  Until  now,  only  the  mag¬ 
nitude  of  the  Impetus  is  modified  by  2  factors.  Here,  the  Im¬ 
petus  is  divided  into  the  normal  element  that  is  orthogonal  to 
the  object  surface,  and  the  parallel  element  that  is  parallel  to 
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it 


ported. 


Only  the  parallel  element  is  modified  to  represent  the  dy¬ 
namic  friction.  As  the  dynamic  friction  coefficient  increases, 
the  parallel  element  decreases.  The  dynamic  friction  coeffi¬ 
cient  should  ranges  from  0.0  to  1.0. 

parallel  element  ->  parallel  element  *  dynamic  fric¬ 
tion  coefficient 

To  detect  whether  or  not  a  state  belongs  to  the  static  fric¬ 
tion  region  or  dynamic  friction  region,  the  ratio  of  the  paral¬ 
lel  elementAiormal  element  is  used.  If  the  situation  belongs 
to  the  static  friction  region,  the  parallel  element  is  not  modi¬ 
fied. 

The  static  friction  limit  should  be  larger  than  the  dynamic 
friction  coefficient 

if  (parallel  element/normal  element  >  static  friction 
limit)  state  =  dynamic  friction 

else  state  =  static  friction 

(Rotational  Movement) 

This  is  expanded  into  rotational  movement  by  introducing 
a  rotational  Inertia  ratio  "r”.  The  inertia  ratio  is  similar  to  the 
ratio  of  rotational  inertia  by  mass.  This  indicates  the  relative 
easiness  of  rotation  to  that  of  parallel  movement.  In  the  case 
that  the  mass  of  the  object  concentrates  around  the  gravity 
center,  it  can  easily  be  rotated  compared  with  the  ease  of  par¬ 
allel  movement.  In  the  case  that  the  mass  exists  mainly  on  the 
surface  of  object,  it  is  relatively  difficult  to  rotate. 

Here,  the  Impetus  is  divided  into  two  components.  One  is 
the  parallel  component  that  is  parallel  with  a  vector  from  the 
fingertip  to  the  center  of  the  object.  Another  is  the  rotational 
component  that  is  the  rest  of  the  Impetus.  This  should  cause 
the  rotational  movement. 

rotational  Impetus  =  rotational  component  *  r  *  dis¬ 
tance  from  the  center  of  the  object  to  the  fingertip. 

The  rotational  impetus  is  used  to  cause  rotation.  In  our  ex¬ 
perimental  system,  this  is  directly  converted  to  rotation  for 
one  time  (from  time  n  to  n+1),  continuous  rotation  is  not  sup- 


(Restrictive  movement  on  a  plane) 

The  restriction  of  movement  on  a  plane  is  introduced.  This 
is  achieved  by  simply  eliminating  the  component  that  meets 
at  right  angle  to  the  restriction  plane. 

33  Region  detection 

The  applicable  region  of  this  method  is  detected  by  investi¬ 
gating  whether  or  not  the  result  of  calculation  satisfies  as¬ 
sumption  2  or  not  The  problem  is  that  "the  result  satisfies 
assumption  2”  does  not  necessarily  mean  "the  values  used  in 
the  calculation  satisfy  assumption  2”.  In  order  to  solve  this 
problem,  we  introduce  a  restriction  to  the  calculation  as  fol¬ 
lows:  "If  the  object  satisfies  assumption  2  in  the  region  of 
time  (tl..  t2),  it  satisfies  the  assumption  2  at  t2." 

3.4  Experiment  System 

Figure  3.3  shows  the  system  for  experiments.  The  compu¬ 
tation  of  the  behavior  and  the  generation  of  graphics  are  per¬ 
formed  on  IRIS  VGX-210  workstation.  Polhemus  IsoTrakand 
DataGIove  is  used  to  sense  the  fingertip  point.  All  the  pro¬ 
gram  is  written  on  VisAge  that  is  an  toolkit  to  generate  vir¬ 
tual  environment.  VisAge  is  developed  by  the  first  author  of 
this  paper,  and  is  consist  of  libraries  and  tools  to  develop  the 
virtual  reality  application.  It  contains  a  simple  management 
structure  for  behavior  calculations.  Objects  are  maintained  in 


IRIS  VGX-210 
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Figure  3.3  System  for  Experiment 


a  form  of  a  tree.  Each  object  is  defined  as  a  set  of  attribute 
and  a  pointer  to  a  function  that  defines  the  behavior.  The  func¬ 
tion  is  called  automatically  by  the  management  structure  re¬ 
ferring  the  tree.  As  a  programming  style,  it  is  recommended 
to  divide  the  function  into  two  parts,  the  detection  of  appli¬ 
cable  region  and  the  behavior  calculation  itself.  The  state  tran¬ 
sition  is  implemented  by  changing  which  function  the  pointer 
indicates. 

The  experiment  in  section  4  is  performed  on  the  same  sys¬ 
tem. 

3.5:  Experimental  Result 

The  experimental  task  was  to  place  an  object  in  the  indi¬ 


cated  region  by  pushing  the  object  with  one  finger.  Case  1 
has  3  degrees  of  freedom  parallel  movement  in  free  space. 
Case  2  has  2  degrees  of  freedom  parallel  movement  and  1 
degrees  of  freedom  rotation  on  a  plane.  In  both  cases,  the 
result  shows  that  this  manipulation  is  easier  as  compared  to 
general  symbolic  manipulation. 

Especially  in  case  1,  the  user  used  2  states  of  friction  prop¬ 
erly  according  to  the  phase  of  task.  This  contributes  to  the 
improvement  of  performance.  (Figure  3.3, 3.4) 

Also  the  results  showed  that  the  user  could  "feel"  the  (vir¬ 
tual)  attribute  (friction  ratio).  In  the  real  world,  physical  quan¬ 
tities  are  receipt  as  stimulus,  cause  the  senses.  Virtual  attributes 
cause  sense  through  the  recognition  of  relations  between  the 
motion  of  the  hand  and  the  behavior  of  the  object. 


100mm  (vertical  ofset) 
(horisontal  ofset) 

Experimental  Task 


■  by  method  based 
on  gesture 

by  Impetus  method 
Q  (friction=large) 

Q  (friction=medium) 
Q  (friction=smaIl) 

apploach  both 

through 

phase  of  task 

£b)  Methodsvs  Task  Time  for  Each  Phase  ofTask 

Figure  3.4  Experiment  1 


4:  Representative  Spherical  Plane  Method 
4.1  Basic  Concept 

In  this  section,  RSPM  (Representative  Spherical  Plane 
Method)  is  detailed.  This  is  for  grasping  and  manipulating  an 
object  with  3  fingertips. 

To  begin  with,  let  us  introduce  a  sphere  (RSP)  with  radius  r 
that  represents  an  object.  The  fingertips  manipulate  this  sphere, 
and  the  object  behaves  similarly  as  this  sphere.  RSP  moves 
according  to  the  position  of  3  fingertips,  therefore,  3  finger¬ 
tips  only  slide  on  the  RSP  while  it  is  grasping  the  object.  This 
movement  of  RSP  is  defined  as  the  behavior  of  the  object. 

The  merits  of  this  method  are  as  follows: 

-  Fane  Manipulation  with  fingertips 

The  freedom  of  a  finger  is  larger  than  that  of  the  back  of  the 
hand,  and  a  finger  is  more  accurate.  A  finger  is  a  "fine"  part 
of  the  hand.  In  conventional  gesture  based  manipulation  meth- 
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20  X  6  cm 

ra>  Experimental  Task 


■  by  method  based 
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ES  by  Impetus  method 


(b)  Average  Time  to  Complete  One  Task 


Figure  3.5  Experiment  2 
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ods,  the  position  and  orientation  of  grasped  objects  are  gotten 
from  the  back  of  hand,  and  not  from  the  finger.  Fingers  are 
utilized  only  for  gesture  detection.  On  the  contrary,  via  RSPM, 
user  can  drive  objects  by  their  fingertips.  This  means  the  user 
could  manipulate  objects  with  more  accuracy. 

-  Uniformity  of  Manipulation; 

The  user  can  manipulate  any  object  with  the  same  feel 
of  manipulation. 

-  Uniformity  of  Calculation; 

Any  object  can  be  manipulated  based  on  the  same  al¬ 
gorithm. 

RSPM  begins  with  the  condition  that  all  3  fingers  are  in¬ 
volved  in  the  object,  and  that  3  fingers  can  grasp  RSP.  The 
second  condition  is  checked  by  calculating  the  radius  of  circum 
circle  of  the  triangle  whose  vertices  are  3  fingertips.  The  sec¬ 
ond  condition  is  equivalent  to  that  the  radius  of  circum  circle 
is  smaller  than  that  of  RSP. 


FTT  is  achievd  each  time.  The  transformation  from  the  initial 
FIT  to  each  FTT  is  equivalent  to  that  from  the  object  when  it 
becomes  grasped  to  the  object  at  each  time. 

If  FTT  does  not  deform,  the  matrix  that  transforms  the  ini¬ 
tial  FTT  to  each  FTT  can  be  calculated  easily.  But  in  fact  it 
deforms. 

The  matrix  is  restricted  as  combination  of  a  unitary  (rota¬ 
tion)  matrix  and  a  translation  matrix.  To  avoid  deformation, 
the  following  calculation  is  used; 

(1) :  To  move  initial  FTT  such  that  the  gravity  center  lo¬ 
cates  at  that  of  each  FTT, 

(2) :  To  rotate  it  around  the  gravity  center  such  that  the  nor¬ 
mal  vector  becomes  equal  to  that  of  each  FTT. 

(3) :  To  rotate  around  the  normal  vector  such  that  the  ver¬ 
texes  become  closest  to  corresponding  vertexes  of  each  FTT. 
As  the  performance  criterion,  authors  adopted  the  summing 
of  absolute  angle  between  the  vectors  that  is  from  gravity  cen¬ 
ter  to  venex. 


42  Behavior  Calculation 

The  behavior  of  RSPM  is  calculated  as  follows. 

When  fingertips  grasp  the  object,  an  initial  matrix  that  rep¬ 
resents  the  position  and  orientation  of  RSP  is  calculated. 

While  grasping,  a  temporary  matrix  that  represents  RSP  at 
that  time  is  calculated.  The  transformation  matrix  from  the 
initial  matrix  to  the  temporal  matrix  is  easily  calculated. 

Let  us  focus  on  the  fingertip  triangle  (FTT)  that  is  com¬ 
prised  from  3  fingertips.  When  fingertips  grasp  the  object, 
the  initial  FTT  is  stored  in  the  memory.  While  grasping,  the 


The  result  is  calculated  directly  in  analytical  way,  not  in 
repetitive  way.  The  system  cycle  was  60Hz,  which  include 
retrieval  of  data  from  sensor,  calculation,  graphics  genera¬ 
tion  on  IRIS  VGX-210  workstation. 

43  Experiment 

A  simple  experiment  was  performed.  A  target  cube  and  an¬ 
other  cube  were  displayed.  Each  surface  was  painted  in  dif¬ 
ferent  colors.  The  task  was  to  manipulate  the  cube  so  that 
whose  orientation  matches  with  that  of  the  target  cube.  Two 
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Figure  4.2  Experiment 


£ 


subject 

by  method  based  on 

gesture 

RSPM 


methods  was  utilized.  One  was  RSPM,  another  was  general 
method  based  on  gesture  such  that  "if  the  gesture  is  FIST  and 
the  hand  is  near  to  the  object  then  grasp"  and  that  "the  move¬ 
ment  is  the  same  as  that  of  the  back  of  hand". 

The  subjects  performed  10  times  per  method.  The  average 
time  to  complete  the  task  was  measured.  Users  grasped  ob¬ 
ject  several  times  because  the  range  of  rotation  was  limited. 
They  repeated  grasping,  rotating,  releasing.  Also  the  time  of 
grasping  was  counted. 

Figure  4.2  shows  the  result. 

The  average  time  for  RSPM  was  shoner  than  that  of  method 
based  on  gesture.  Also  the  number  of  re-grasp  was  smaller  in 
the  case  of  RSPM.  This  indicates  the  range  of  rotation  in¬ 
creased  in  RSPM  and  RSPM  was  easier  method  than  the  con- 
vendonal  gesture  based  method. 

5  Discussion  and  Future  Work 

5.1  Extending  Impetus  Method  into  Multi-finger 
Manipulation 

Currendy  the  Impetus  Method  deals  the  interaction  between 
one  fingertip  and  the  object.  The  authors  are  testing  an  algo¬ 
rithm  for  the  manipulation  by  two  fingertips.  Two  impetuses 
from  two  fingertips  are  simply  added  and  the  result  moves 
the  object.  This  is  partly  generates  the  correct  behavior.  When 
one  fingertip  is  invading  into  the  object,  the  object  is  moved 
by  one  impetus.  There  is  the  case  where  the  moved  object 
contains  another  fingertip.  This  breaks  the  assumption  2  and 
causes  the  incorrect  behavior.  The  authors  are  testing  several 
algorithms  to  merge  multiple  impetuses  and  several  detec¬ 
tion  algorithms.  Although  some  of  them  work  well  in  some 
cases,  the  suitable  combination  for  any  case  has  not  been 
found. 


Under  some  successful  situations,  the  sequence  as  follows 
is  achieved: 

1:  one  finger  continuously  pushes  the  object. 

2:  two  fingers  pushed  the  object  alternatively 

3:  two  finger  pushes  the  object  at  once 

4:  the  object  becomes  not  to  satisfy  the  assumption  2,  state 
is  changed  into  the  region  of  kinematics,  the  object  is  picked 
up  by  two  fingers. 

5.2  Discussion  on  Region  Detection  for  RSPM 

Although  the  experimental  result  shows  the  improvement 
of  manipulation  performance  from  the  conventional  manipu¬ 
lation  method,  RSPM  leaves  room  for  improvement.  Much 
still  remains  to  be  done  about  the  condition  when  RSPM  be¬ 
gins  (grasp  object)  and  ends  (release  object),  namely,  the  de¬ 
tection  of  the  applicable  region  for  RSPM. 

We  will  begin  with  a  simple  case  where  only  RSPM  serves 
for  manipulation  calculation.  In  other  words,  the  configura¬ 
tion  space  is  divided  into  only  2  region,  free  and  grasped  (Fig¬ 
ure  5.1(a)).  The  problem  is  that  there  are  cases  where  the  ob¬ 
ject  is  released  while  the  user  does  not  intend  to  release  it,  and 
the  object  is  still  grasped  while  the  user  intends  to  release  it. 
The  programmer  can  control  the  easiness  to  grasp  and  to  re¬ 
lease  by  the  radius  of  RSP.  As  the  radius  increases,  the  region 
of  RSPM  widens,  the  object  becomes  easier  to  grasp  and  more 
difficult  to  release  (Figure  5.1  (c)).  This  parameter  is  not  enough 
because  it  cannot  increase/decrease  both  the  easiness  to  grasp 
and  release  at  once.  To  say  as  analogy,  the  control  of  the  size 
of  the  region  is  not  enough  while  it  is  useful  to  control  the 
nature  of  manipulation. 

To  solve  this  problem,  authors  introduced  the  hysteresis 
(Figure  5.2).  Before  grasping  (when  the  object  is  free),  the 
radius  of  RSP  is  set  to  rl ,  and  When  the  object  is  grasped,  it  is 
set  to  r2.  When  we  give  rl  a  smaller  value  and  r2  a  larger 
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(a)  Configuration  Space  is  divided  into  2  regions 


(b)  With  Smaller  Radius  of  (c)  Larger  Radius  of 
RSP  (difficult  to  grasp.  RSP  (easy  to  grasp, 

easy  to  release)  difficult  to  release) 


Figure  5.1  Applicable  region  of  RSPM 


the  region  of  RSPM 
seen  from  the  region 
of  Free 

the  region  of  RSPM 
seen  from  the  region 
of  RSPM 


Figure  5.2  Introducing  Hysteresis 


value,  it  becomes  more  difficult  to  grasp  and  it  becomes  more 
difficult  to  release.  This  does  not  seem  to  improve  the  feel  of 
manipulation.  To  say  by  analogy,  we  need  to  find  the  suitable 
shape  of  the  boundary  line  of  the  region. 

The  discussion  above  is  the  region  of  RSPM  itself.  We  may 
note  briefly  the  further  problem  on  the  relation  among  differ¬ 
ent  manipulation  methods.  Each  method  has  an  applicable 
region.  As  the  discussion  above  shows,  it  is  not  easy  to  de¬ 
sign  the  region  solely.  If  there  are  several  methods,  it  is  needed 
to  think  about  the  overlapped  region,  about  the  void  space  of 
2  regions. 


6  Conclusion 

First,  the  calculation  for  object  manipulation  was  general¬ 
ized. 

Authors  pointed  out  that  calculation  of  object  behavior  was 
defined  as  a  set  of  pairs  of  the  calculation  model  and  the  ap¬ 
plicable  region,  the  region  detection.  The  t>"pe  of  region  was 
classified  into  3  categories.  Dynamics,  Kinematics  and  the 
boudary  category  of  them. 

Second,  Impetus  Method  as  Dynamics-like  method  was 
described  for  the  behavior  calculation  of  the  boundary  cat¬ 
egory.  It  was  based  on  a  simple  phenomenon,  the  collision 
between  fingertip  and  object  surface.  Experimental  result 
showed  superiority  to  the  conventional  method. 

Third,  RSPM  for  grasping  and  manipulating  object  by  3 
finger  tips,  was  described  as  Kinematics-like  method.  Experi¬ 
mental  result  showed  superiority  to  the  conventional  method 
At  last,  the  expansion  of  Impetus  Method  into  the  manipu¬ 
lation  by  multiple  fingertips  was  discussed.  Also  the  appli¬ 
cable  region  of  RSPM  method  was  discussed. 
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Abstract 

This  paper  proposes  a  general  framework  to  enhance  grasping 
interactions  of  an  operator  wearing  a  digital  glove.  We  focus  on  a 
consistent  interpretation  of  the  posture  information  acquired  with 
the  glove  in  order  to  reflect  the  grasp  of  virtual  artifacts.  This 
allows  manipulations  requiring  a  higher  skill  in  virtual 
environment  and  also  improve  interactions  with  virtual  human 
models.  A  handshake  case-study  highlights  the  application  range 
of  this  methodology. 

key  words  :  grasping,  virtual  human,  digital  glove 


1  Introduction 

With  the  advents  of  synthetic  actors  in  computer  animation,  study  of  human  grasping  has 
become  a  key  issue  in  this  field.  The  common  used  method  is  a  knowledge  based 
approach  for  grasp  selection  and  motion  planning  [RG91].  It  can  be  completed  with  a 
proximity  sensor  model  [EB85]  or  a  sensor-actuator  model  [vdPF93]  for  the  precise 
position  control  of  the  fingers  around  the  object  [MT94].  This  method  results  in  an 
automatic  grasping  procedure.  Moreover,  due  to  the  3D  interactive  tools  widely  available 
today,  we  decided  to  study  interactive  grasping  of  virtual  objects  while  wearing  a  digital 
glove  device.  Such  an  approach  is  also  interesting  for  Virtual  Reality  where  more 
elaborated  hand  interaction  is  now  possible  with  recent  generation  of  digital  glove,  way 
In  this  context,  we  map  the  real  posture  of  the  digital  glove  on  a  sensor-based  virtual 
hand  in  order  to  ensure  a  consistent  collision-free  grasping  of  virtual  objects  and  to 
provide  a  consistent  visual  feedback.  In  a  second  stage,  this  process  can  drive  the  grasp 
behavior  of  a  virtual  human  model  with  a  classical  inverse  kinematic  control  applied  to  the 
arm,  or  a  larger  fraction  of  the  body  [PB90].  More  elaborated  control  approaches  of  the 
^rm  have  been  proposed  but  this  is  beyond  the  scope  of  this  paper  (see  [L93], 
[HBMTT95]).  Both  automatic  and  interactive  grasping  are  integrated  within  the  TRACK 
system  [HBMTT95]  hence  allowing  such  grasping  interaction  as  a  handshake  of  an 
operator  with  a  virtual  human  model. 

We  &st  recall  the  principle  of  the  multiple  virtual  sensor  grasping  prior  to  develop  the 
consistent  virtual  grasp  with  the  digital  glove.  The  framework  of  interactively  driving  the 
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grasp  behavior  of  a  virtual  human  model  is  then  outlined  and  a  case  study  is  presented.  A 
discussion  summarizes  the  performances,  the  interest  and  the  limitations  of  the  current 
state  of  the  system.  A  short  section  presents  the  implementation  details  prior  to  the 
general  conclusion. 

2  Automatic  Grasping  with  Multiple  Virtual  Sensors 

This  section  briefly  recalls  the  interest  of  automatic  grasping  based  on  virtual  sensors 
(without  digital  glove).  Our  approach  is  adapted  from  the  use  of  proximity  sensors  in 
Robotics  [EB85]  and  the  sensor-actuator  networks  [vdPF93].  More  precisely,  we  use 
multiple  spherical  sensors  for  the  evaluation  of  both  touch  and  distance  characteristics  as 
proposed  in  [MT94].  They  were  found  veiy  efficient  for  synthetic  actor  grasping 
problem.  Basically,  a  set  of  sensors  is  attached  to  the  articulated  figure.  Each  sphere 
sensor  is  fitted  to  its  associated  joint  shape,  with  different  radii.  The  touch  property  of 
any  sensor  is  activated  whenever  colliding  with  other  sensors  or  objects  (except  the  hand 
components).  This  is  especially  easy  to  compute  with  spherical  sensors  (Figure  1). 


Figure  1.  The  virtual  hand  with  sphere  sensors  (a);  while  grasping  a  sphere  (b,  c) 


The  sensor  configuration  is  important  in  our  method  because,  when  the  touch  property  of 
a  sensor  is  activated  in  a  finger,  only  the  proximal  joints  are  locked  while  distal  joints  can 
still  move.  In  such  a  way,  all  the  fingers  are  finally  wrapped  around  the  object,  as  shown 
in  Figure  lb,c.  When  grasping  a  free  form  surface  object,  the  sphere  sensors  are 
detecting  collision  with  the  object.  We  do  not  discuss  more  on  collision  detection  which 
is  beyond  the  scope  of  this  paper  (see  [MT94],  [K93]). 

The  automatic  grasping  methodology  is  the  following  [MT94]':  first  a  strategy  is  selected 
according  to  the  type  and  the  size  of  the  object  to  grasp.  A  target  location  and  orientation 
is  determined  for  the  hand  frame  and  it  is  realized  with  the  weU  known  inverse  kinematics 
approach.  The  next  stage  is  to  close  the  fmgers  according  to  the  selected  strategy  (e.g. 
pinch,  wrap,  lateral,  etc.)  while  sensor-object  and  sensor-sensor  collisions  are  detected. 
Any  touch  detection  locks  the  joints  on  the  proximal  side  of  the  associated  sensor.  The 
grasping  is  completed  when  all  the  joints  are  locked  or  reaching  their  upper  or  lower 
limit. 

3  Interactive  Grasping  with  a  Digital  Glove 

When  an  operator  is  wearing  a  digital  glove  the  joint  values  acquired  with  the  device  are 
normally  used  to  update  the  visualization  of  the  hand  posture.  Such  approach  is  pertinent 
as  long  as  the  device  is  used  to  specify  commands  through  posture  recognition  [SZF89]. 
Among  these  commands  there  usually  is  a  symbolic  grasp  of  virtual  objects  where  the 
relative  position  and  orientation  of  the  object  is  maintained  fixed  in  the  hand  coordinate 
system  as  long  as  the  grasp  posture  is  recognized.  Such  approach  is  suitable  for  pick- 
and-place  tasks  of  rigid  objects  and  it  is  not  our  purpose  to  change  it. 
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However,  as  the  virtual  environment  is  becoming  more  and  more  complex,  especially 
with  the  advent  of  virtual  humans,  hand-based  interactions  also  evolve  in  complexity. 
The  limitation  of  the  current  approach  mostly  comes  from  the  rough  relative  positioning 
of  the  hand  and  the  object  which  does  not  convey  a  clear  understanding  of  the  action  to 
perform  with  the  object.  As  everybody  has  experienced  him/herself,  we  adopt  different 
grasping  postures  of  a  same  object  according  to  the  function  we  intend  to  exert  with  that 
object  because  different  degrees  of  mobility  are  involved  for  these  different  tasks  (e.g. 
giving  or  using  a  screwdriver).  Moreover,  the  immersion  and  the  interaction  in  a  virtual 
environment  may  not  only  reduce  to  a  matter  of  pick  and  place  but  also  imply  abilities 
requiring  a  greater  skill.  In  such  a  case  the  hand  associated  with  the  digital  glove  device 
provides  a  high  dimensional  space  not  only  as  a  posture  space  (for  command  recognition) 
but  also  as  a  goal-oriented  space  (for  precise  manipulation  or  modification  of  virtual 
objects  with  additional  tools).  For  example,  interaction  with  non-manufactured 
deformable  entities  (articulated  or  continuously  deformable  objects)  is  best  performed 
with  direct  hand  interaction  as  it  is  our  most  elaborated  tool  to  translate  our  design 
intention  into  action,  in  such  context  we  need  to  evaluate  precisely  their  mutual  contact 
location. 

3.1  Interactive  Grasping  Automata 

Within  this  extended  application  context  of  the  digital  glove,  it  becomes  cracial  to  display 
a  posture  of  the  hand  consistent  with  the  on-going  manipulation  of  the  virtual  object.  For 
this  reason,  we  propose  now  a  new  approach  for  the  interactive  and  consistent  grasping 
of  virtual  entities  with  the  interactive  grasping  automata  (Figure  2). 


Figure  2 :  The  interactive  grasping  automata 

In  our  method  we  consider  three  different  states  of  interactive  grasping  : 

FREE  HAND  ;  the  hand  is  freely  moving  in  space  without  holding  any  object.  The 
hand  posture  is  displayed  as  measured  with  the  device.  Whenever  the  hand  bounding 
sphere  intersects  the  object  bounding  sphere,  we  enter  in  the  "GRASPING  in  progress" 
state. 

GRASPING  in  progress  ;  the  touch  property  of  the  sensors  is  continuously 
evaluated  to  adjust  the  posture  of  colliding  fingers  with  the  object  to  grasp  {the  object 
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is  fixed  or  moving  in  a  world  coordinate  system).  Whenever  the  hand  bounding 
sphere  no  more  intersects  the  object  bounding  sphere,  we  return  to  the  "FREE_HAND" 
state.  On  the  other  hand,  if  our  simplified  grasp  condition  is  established,  i.e.  at  least  the 
thumb  and  one  finger  are  maintaining  a  durable  contact  with  the  object,  we  enter  the 
"SECURE.GRASP"  state. 

SECURE  GRASP  :  the  touch  property  of  the  sensors  is  still  used  to  continuously 
adjust  the  posture  of  colliding  fingers  with  the  grasped  object  {the  object  position  is 
fixed  in  the  hand  coordinate  system).  As  soon  as  the  simplified  grasp  condition 
vanishes,  we  return  to  the  "GRASPING  in  progress"  state. 

3.2  Hand  Posture  Correction 

Unlike  the  automatic  grasping  procedure,  the  interactive  grasping  procedure  adjusts  the 
hand  posture  by  opening  it  rather  than  closing  it.  Even  with  the  recent  generation  of 
digital  glove  it  is  difficult  to  adjust  the  grasp  precisely  so  that  the  fingers  establish  a 
permanent  contact  without  penetrating  into  the  virtual  object.  This  is  due  to  the  fact  that 
we  only  a  have  a  visual  feedback  without  any  force  or  touch  feedback.  However,  it 
would  be  misleading  to  conclude  that  the  automatic  grasping  procedure  should  also  apply 
in  this  context.  Basically,  interactive  grasping  implies  to  ensure  the  highest  autonomy  of 
the  operator  and  to  provide  means  of  correction  rather  than  removing  degrees  of  freedom. 

It  is  more  comfortable  for  the  operator  to  freely  move  and  close  the  hand  according  to  the 
visual  feedback  of  the  virtual  environment.  So  our  working  hypothesis  is  to  rely  on  the 
operator  to  permanently  close  the  grasping  fingers  slightly  more  than  theoretically 
necessary.  In  such  a  way,  the  opening  correction  approach  establishes  a  durable  contact 
which  overcomes  the  unavoidable  small  variations  of  hand  posture  and  position  (Fig.  3). 
An  optional  mode  of  Assisted  Folding  is  also  provided  to  guide  the  operator  in  searching 
the  proper  grasp  posture.  In  this  mode,  any  sensor  initially  situated  between  the  first 
colliding  sensor  and  the  finger  tip  is  brought  to  be  tangent  to  the  object.  If  the  sensor  is 
intersecting  the  object  then  the  associated  joint  is  opened  otherwise  it  is  closed.  In  such  a 
way,  the  distal  part  of  a  colliding  finger  consistently  wraps  around  the  object  (Figure  4). 
The  correction  algorithm  is  characterized  by  an  opening-wrapping  adjustment  loop  with 
eventual  Assisted  Folding  for  each  colliding  finger.  So,  for  each  time  step,  we  have  : 

For  each  colliding  finger 

For  each  sensor  distal  to  the  colliding  one  closest  to  the  base 
(from  base  side  to  tip  side  of  the  finger) 

If  the  sensor  is  currently  colliding 

Unfold  the  closest  proximal  joint  (wrist  side) 
until  the  sensor  is  tangent  to  the  object 
or  the  joint  reaches  its  limit 


Else 


If  in  Assisted  Folding  Mode 

Fold  the  closest  proximal  joint  (wrist  side) 
until  the  sensor  is  tangent  to  the  object 
or  the  joint  reaches  its  limit 

Endlf 

Endlf 

EndFor 

EndFor 

Figure  3  details  all  the  stages  of  the  opening-wrapping  algorithm  for  one  colliding  finger 
with  an  elliptic  shape  (in  2D  for  clarity).  In  the  example,  the  joints  are  all  successively 
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opened  because  the  distal  sensors  (on  the  finger  tip  side)  are  still  colliding  even  after  the 
correction  of  the  proximal  ones  (on  the  wrist  side).  The  algorithm  begins  by  unfolding 
the  finger  base  joint  to  release  the  first  colliding  sensor  (fig.  3a, b).  Then  it  unfolds  the 
next  joint  to  remove  the  following  sensor  (Fig  3b,c)  and  the  same  occurs  for  the  last  joint 
(Fig.  3c,d).  In  this  case  the  final  finger  posture  consistently  wraps  around  the  object. 


—  corrected  sensor 


Figure  3  :  an  example  of  the  opening-wrapping  adjustment  loop  for  interactive  grasping 

The  Assisted  Folding  mode  is  especially  interesting  whenever  the  tip  side  of  the  finger  is 
not  in  the  operator  field  of  view  (Figure  4).  This  happens  for  a  large  class  of  grasping 
postures  and  objects  to  grasp.  In  such  a  way  the  operator  is  given  a  hint  about  the  full 
grasp  posture  of  the  colliding  fingers.  Then,  from  that  continuous  visual  feedback  he/she 
can  adjust  the  hand  position,  orientation  and  posture  in  order  to  perform  a  desired  grasp. 
In  figure  4  example,  the  first  corrected  sensor  relies  on  opening  the  associated  joint  while 
the  next  two  distal  sensors  are  brought  to  be  tangent  to  the  rectangle  shape  by  closing 
their  associated  joint. 


corrected  sensor 


.  (Assisted 
^  Folding) 

corrected  sensor 


Figure  4  :  an  example  of  the  opening-wrapping  adjustment  with  assisted  folding 
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4  Integration  within  the  TRACK  system 

The  TRACK  animation  system  is  dedicated  to  human  animation  design  [BHMTT94]. 
More  recently  we  have  begun  to  evaluate  the  potential  of  virtual  environments  through  3D 
interaction  with  virtual  humans.  As  such,  both  grasping  approaches,  automatic  and 
interactive,  are  key  features  to  integrate  within  the  TRACK  system.  We  now  present  the 
current  state  of  this  integration  and  outline  the  progressive  intermixing  of  both  techniques 
in  order  to  allow  complex  grasping  interaction. 

-  Automatic  grasping  of  static  volumic  primitives  and  polygonal  surfaces  has  been 
defined  for  both  one  and  ^o  hands  grasping  for  the  virtual  human  (Figure  1) 
[roMTT95],  [MT94].  So  it  is  already  possible  to  exchange  handshake  between 
virtual  humans  by  activating  their  hand  motion  and  automatic  grasp  in  sequence 
rather  than  simultaneously,  (figure  5) . 

-  Autonomous  interactive  grasping  is  already  performed  on  volumic  primitives  of 
the  virtual  enviromnent  and  is  to  be  extended  on  polygonal  surface. 

-  Guiding  one  virtual  human's  hand  with  a  6D  device  as  the  Spaceball  or  a  digital 
glove  is  an  alternate  approach  to  the  knowledge-based  selection  of  the  grasp  posture 
and  positioning  relatively  to  the  object.  The  Automatic  grasping  closure  is  then 
performed  to  establish  a  wrapping  grasp  based  on  the  sensor  collision  detection 
while  closing  the  fingers.  In  such  a  context,  the  device  must  stay  in  the  reachable 
area  of  the  virtual  human’s  hand  otherwise  the  virtual  human  has  to  be  globally 
displaced. 

-  Finally,  the  most  complete  intermixing  of  both  techniques  is  to  fully  map  the 
digital  glove  position  and  posture  on  the  virtual  human's  hand  and  to  manage  it 
according  to  the  interactive  grasping  approach.  In  such  a  way  the  operator  can  folly 
hpdle  the  grasping  function  of  one  virtual  human  model  and  interact  with  other 
virtual  humans.  The  other  virtual  humans  can  in  turn  respond  to  the  operator 
according  to  the  automatic  grasping  approach  as  long  as  the  grasping  target  is  not 
moving.  For  moving  objects  as  in  a  handshake  context,  the  operator's  virtual  hand 
is  the  target  of  the  virtual  human's  hand  and  both  have  to  use  the  interactive  grasp 
with  assisted  folding.  In  such  a  way  the  virtual  human  closes  its  fingers  only  when 
colliding  the  operator's  hand.  Moreover  this  requirement  overcomes  the  operator's 
hand  postural  changes  and  simultaneous  position  variations. 


Figure  5.:  handshake  between  virtual  actors 
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5  Results 

Two  interactive  grasping  experiments  are  presented  here.  They  deal  with  the  grasping  of 
regular  volumic  primitives  as  a  sphere  and  a  rectangular  volume  with  various  sizes.  First 
figure  6  exhibits  the  hand  model  (as  provided  with  the  digital  glove  library  from  Virtual 
Technologies).  We  limit  the  size  of  Ae  virtual  objects  to  grasp  in  order  to  permit  single 
hand  grasping.  The  following  figures  shows  simultaneous  view  of  the  real  hand  posture 
as  acquired  with  the  digital  glove  and  the  one  displayed  as  the  result  of  the  interactive 
grasping  approach.  The  sensors  are  displayed  as  cubes  to  reduce  the  amount  of 
polygons. 
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6  Discussion 

The  interactive  grasping  experiments  were  realized  at  interactive  rate  of  approximately  10 
images  per  second  on  a  workstation  screen  with  the  management  of  up  to  20  spherical 
sensors  and  the  display  of  the  high  resolution  hand  (2600  polygons),  the  object  and  the 
sensors.  Although  a  small  delay  was  perceptible  between  performance  and  display  of  the 
corrected  hand  posture,  is  was  not  a  decisive  aspect. 

Basic^y,  interactive  grasping  proved  to  be  a  valuable  tool  for  the  interactive  design  of 
realistic  griping  posture  for  human  animation  system.  The  interactive  refresh  rate  is  the 
main  criteria  for  that  purpose  and  our  experiments  have  demonstrates  this  capability.  So, 
from  the  visud  feedback  of  the  corrected  grasp,  an  operator  is  able  to  achieve  a  good 
perception-action  control  loop  in  order  to  continuously  adjust  the  grasp  into  a  realistic 
posture.  In  our  default  setting  the  display  is  completed  with  color  coding  information  as 
described  now.  Each  state  of  the  interactive  grasping  automata  is  associated  with  a 
different  color  of  the  whole  hand  :  pale  skin  color  for  FREE_HAND,  blue  for 
GRASPING_in_progress  and  green  for  SECURE_GRASP.  Moreover,  whenever  a 
sphere  sensor  is  colliding,  its  color  changes  from  dark  gray  to  light  gray.  This  additional 
symbolic  information  significantly  enhances  the  operator  feedback. 

Regarding  Virtual  Environments  applications  the  realism  of  the  grasping  posture  may  not 
be  an  essential  factor  in  favor  of  our  approach.  In  such  a  context  it  is  often  not  the  point 
to  behave  exactly  as  in  the  real  world.  A  VE  operator  is  more  concerned  with  a  greater 
manipulation  skill  rather  than  a  more  realistic  posture.  In  that  aspect  our  approach  also 
improves  the  manipulation  of  objects  by  comparison  with  the  well  known  symbolic 
grasp.  In  the  symbolic  context,  the  operator  has  to  perform  well  defined  hand  postures  (at 
least  two  distinct  ones)  so  that  the  recognition  system  separates  properly  the  grasp  from 
the  release  commands  (not  to  mention  that  the  object  should  be  selected  first).  So, 
whenever  the  operator  wishes  to  precisely  modify  the  relative  position  and  orientation  of 
the  object  with  respect  to  the  hand  (this  is  the  proper  definition  of  a  manipulation),  he/she 
has  to  grasp  it,  to  reorient  the  hand-object  system,  release  the  reoriented  object  and  move 
the  hand  freely  to  a  new  relative  orientation.  In  our  approach  the  reorientation  is  still 
perfonned  in  that  way  but  the  grasp  is  established  in  a  much  simpler  way,  just  by 
touching  the  object  with  the  thumb  and  another  finger.  Such  procedure  is  much  easier  and 
faster  that  the  one  based  on  posture  recognition. 

As  mentioned  just  before  the  finger  manipulation  skill  is  limited  to  set  the  begin  and  the 
end  of  the  grasp.  Modifying  the  relative  orientation  of  the  object  with  respect  to  the  hand 
coordinate  system  is  managed  in  the  same  way  as  symbolic  grasping  (see  above).  This 
procedure  is  tractable  as  long  as  we  perform  a  light  grasp  involving  a  small  number  of 
fingers.  Otherwise,  the  operator  has  to  repeatedly  close  and  open  the  hand  which  can  be 
rapidly  uncomfortable. 

7  Implementation 

We  can  either  use  the  DataGlove  from  VPL  or  the  Cyber  Glove  from  Virtual 
Technologies.  This  latter  has  been  retained  for  the  performance  evaluation  all  along  the 
present  experiments.  The  hand  polygonal  model  provided  by  Virtual  Technologies  comes 
from  the  3-D  Dataset  of  Viewpoint  DataLabs.  The  position  and  orientation  of  the  Cyber 
Glove  was  acquired  with  one  bird  sensor  from  the  "Flock  of  bird"  device  of  Ascension 
Technologies.  The  interactive  grasping  was  computed  on  a  Silicon  Graphics  Indigo  n 
Extreme.  The  virtual  human  and  interactive  grasping  software  are  written  in  C  language. 

8  Conclusion 

We  have  studied  interactive  grasping  of  virtual  objects  to  improve  the  goal-oriented 
interactions  of  an  operator  wearing  a  digital  glove  device.  The  interactive  grasping 
approach  with  the  opening-wrapping  algorithm  ensures  a  consistent  collision-free 


Second  EUROGRAPHICS  Workshop  on  Virtual  Environments,  Monte-Carlo  1995 

grasping  of  virtual  objects  which  proves  to  be  a  valuable  visual  feedback  for  the  operator 
hence  allowing  to  manage  tasks  involving  a  higher  skill  than  before.  Encouraging 
performances  on  a  standard  graphic  workstation  open  the  way  for  integration  in  fully  . 
immersive  systems. 

Interesting  applications  of  this  techniques  appear  as  virtual  human  models  also  begin  to 
invade  virtual  environments  or  as  the  digital  glove  begin  to  invest  animation  systems 
dedicated  to  human  animation  design.  For  example,  our  approach  can  drive  the  grasp 
behavior  of  a  virtual  human  model  in  order  to  simplify  all  the  grasping  studies  for 
production.  Both  automatic  and  interactive  grasping  are  integrated  within  our  TRACK 
animation  system. 
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Abstract 

The  enhancement  of  a  virtual  reality  environ¬ 
ment  with  a  speech  interface  is  described.  Some 
areas  where  the  virtual  reality  environment  ben¬ 
efits  from  the  spoken  modality  are  identified  as 
well  as  some  where  the  interpretation  of  natu¬ 
ral  language  utterances  benefits  from  being  sit¬ 
uated  in  a  highly  structured  environment.  The 
issue  of  interaction  metaphors  for  this  configu¬ 
ration  of  interface  modalities  is  investigated. 


Introduction 

Virtual  reality  interfaces  sometimes  seem  to 
be  thought  of  as  embodying  a  return  to  a 
natural  way  of  interaction  -  the  way  we  in¬ 
teract  with  the  real  world^ .  The  interaction 
metaphors  already  introduced  for  VR  (with 
some  trimming  and  tuning  and  the  addition 
of  proper  tactile  feedback...),  would  then  be 
sufficient  for  interaction.  No  learning  would 
be  required,  as  opposed  to  traditional  inter¬ 
faces  -  the  natural  interaction  mechanisms 
are  all  there.  This  is  a  familiar  mistake: 
it  has  been  made  repeatedly  in  the  natural 
language-processing  community.  Not  until 
recent  years  has  it  been  widely  acknowledged 
that  conventions  from  other  human  activi¬ 
ties  do  not  always  carry  over  directly  to  in¬ 
teractions  with  computer  systems.  We  will 

^  “[We  are]  on  our  own  agedn,  after  the  long 
mediation  of  top-down  authored  experience 
Brenda  Latirel,  WIRED  1 .6 


give  some  examples  to  show  similar  oversim¬ 
plifications  regarding  virtual  reality  technol¬ 
ogy* 


#  O 

b 

•  •  , 


o  •© 


"Select  the  grey  marbles." 


Figure  1:  Just  point  and  click. 


The  Naming  Of  Things  Is  A  Serious 
Matter 

“This”  and  “that”  used  deictically  are  physi¬ 
cal  world  concepts  easily  defined  and  formal¬ 
ized  for  virtual  reality  interfaces  in  the  form 
of  direct  manipulation  mechanisms.  How¬ 
ever,  they  constrain  their  users  to  the  here 
and  now,  even  if  “here”  and  “now”  may  be 
defined  differently  than  in  the  physical  real¬ 
ity.  Human  languages  are  by  design  a  step 
beyond  “this”,  “that”,  “here”,  and  “now”. 
They  allow  the  user  to  refer  to  entities  other 
than  concrete  objects,  using  set  conventions: 
abstract  concepts  (“reality”),  actions  (“eat¬ 
ing”),  objects  that  are  not  here  (“the  dog 


1 


Kaxigren,  Bretan,  Frost  and  Jonsson:  Speech  in  VR 


2 


)>  objects  that  are  not  present  now 
(“last  month’s  salary”),  objects  that  cannot 
exist  (“perpetuum  mobile”),  and  objects  se¬ 
lected  for  a  property  (“slow  things”).  In 
pneral,  rendering  the  domain  of  interaction 
in  terms  of  physical  objects  is  not  always 
appropriate  -  many  things  are  difficult  to 
portray^. 


“Where  is  the  paper  about  virtual  re¬ 
ality  I  sent  to  CHI  last  fall?” 


Figure  2:  Try  this  with  gestures. 


Virtual  metaphors  are  conventions 

The  virtual  world  does  not  need  to  obey  th 
laws  of  the  physical:  in  the  real  world,  Ian 
page  is  a  means  to  change  the  world,  an 
in  a  virtual  world  the  world  will  be  easie 
to  change.  Take  something  as  simple  as 
virtual  table.  Unlike  its  physical  relative,  i 
can  change  to  accommodate  the  preference 
of  the  user.  Similarly,  the  virtual  world  ca 
be  instructed  to  transport  us  to  somewher 
in  the  virtual  space.  Naturally,  metaphors 
a  virtual  saw,  a  virtual  pot  of  paint,  a 
ing  carpet,  superpowers  -  to  do  this  wit 
could  be  introduced,  but  they  will  not  b 
more  natural  or  less  conventionally  boun* 
than  use  of  language  would  be,  on  the  con 
trary. 


Paint  the  table  red  and  make  it 
round.” 


board  Microphone 


I 

1 

1 

rl 

Feature'^ - | 

structures 


Prolog  formula - -J 

Text^ 


Commands- 


infovo 

fpapi  1 

i 

1 

piVE 

Speech  recogni¬ 
tion. 


Syntactic  tag¬ 
ging. 


Conversion  of 
engcg-outputto 

Prolog  terms. 

Dependency 

parsing. 


Semantic  analy¬ 
sis  and  refer¬ 
ence  resolution. 

Execution  of  the 
Prolog  formula. 


Prolog/DIVE- 

interface. 


The  virtual 
environment 


Figure  4:  System  architecture. 


“Take  me  to  the  moon.” 


Figure  3:  Manipulating  the  world  with  lan¬ 
guage. 


This  is  the  point  of  playing  chcirades. 
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System  sketch 

Our  system  -  DIVERSE  (DIVE  Real  time 
Speech  Enhancement)  -  is  a  speech  inter¬ 
face  to  a  generic  virtual  environment  based 
on  DIVE  (Distributed  Interactive  Virtual 
Environment)  that  can  be  used  with  com¬ 
plex  worlds  modelled  in  a  variety  of  formats 
(Carlsson  and  Hagsand,  1993).  DIVERSE 
allows  a  user  to  select  and  manipulate  ob¬ 
jects  in  the  world  and  move  about  in  it. 
DIVERSE  is  implemented  as  a  cascaded  se¬ 
quence  of  components.  Speech  recognition  is 
done  by  means  of  a  Hidden  Markov  Model 
system  -  HTK  -  which  has  been  trained 
for  the  domain  (Woodland  et  al,  1994). 
Text  processing  is  performed  by  a  general- 
purpose  surface  syntactic  processor  -  EN- 
GCG  -  which  identifies  syntactic  roles  and 
dependencies  in  the  text  (Karlsson,  1990).  A 
resulting  dependency  graph  is  translated  to 
a  logical  representation,  which  in  turn  is  in¬ 
spected  for  references  to  entities  and  objects 
and  matched  to  the  set  of  conceivable  and 
possible  actions.  The  resulting  queries  or 
commands  are  then  sent  to  DIVE  which  ma¬ 
nipulates  or  queries  the  world  accordingly. 

Interaction  Metaphor 

There  is  no  obvious  counterpart  to  the  user 
for  dialog  with  a  system  in  a  speech  con¬ 
trolled  virtual  environment.  There  are  sev¬ 
eral  conceivable  interaction  models: 

The  basic  metaphor  of  virtual  environ¬ 
ments  is  that  of  Personal  Presence:  the 
user  is  embodied  in  the  real  world  through 
an  actor  or  entity  in  it.  This  model  poses 
problems  for  speech  interaction  -  who  will 
the  user  address?  (**1  now  want  to  paint 
the  house  red...”)  This  metaphor  can  be 
extended  to  that  of  Proxy,  where  users  in 
effect  ride  on  the  back  of  a  virtual  entity. 
Users  share  the  perspective,  and  can  address 
and  control  their  proxies  at  will  “Sindbad: 
paint  the  house  red!”.  An  alternative  sim¬ 
ilar  to  that  of  the  proxy  are  the  closely  re¬ 
lated  metaphors  of  Divinity,  where  users 


give  commands  as  a  god  to  no  obviously 
present  counterpart  but  instead  to  the  world 
itself:  “Paint  the  house  red!”  or  even  “Let 
the  house  be  red!”;  or  that  of  Prayer  where 
users  address  commands  in  a  similar  fashion 
io  a  god. . 

Another  extension  of  the  basic  metaphor 
of  personal  presence  is  that  of  Telekinesis 
where  the  objects  and  entities  of  the  world 
themselves  can  be  counterparts  and  inter¬ 
locutors  to  users:  “House,  open  your  door!”. 
Drawbacks  include  (1)  the  ability  of  an  ob¬ 
ject  or  set  of  objects  to  participate  in  a  di¬ 
alog  is  far  from  obvious;  (2)  talking  to  ob¬ 
jects  not  yet  in  the  world  will  not  be  nat¬ 
ural:  “Three  small  red  cubes,  create  your¬ 
selves!”;  and  (3)  the  need  for  object  inde¬ 
pendent  communication  “Take  me  home”. 
Of  course,  the  last  types  of  message  could 
be  addressed  to  some  type  of  meta-object: 
a  creation  object  or  transportation  object  - 
in  any  case,  the  counterpart  would  be  highly 
convention-bound. 


Figure  5:  Interface  snapshot  with  agent  to 
the  right. 


A  different  type  of  interaction  metaphor  is 
that  of  an  Agent.  The  agent  model  is  dif¬ 
ferent  from  other  models  in  that  it  requires  a 
separately  rendered  autonomous  entity  with 
communicative  capabilities.  The  users  will 
find  a  virtual,  visually  present,  assistant  or 
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agent  to  interact  with.  This  is  necessary 
to  be  able  to  integrate  visual  and  spoken 
feedback  naturally;  with  no  feedback  or  in¬ 
terlocutor,  the  interaction  situation  would 
most  likely  be  very  unfamiliar  and  difficult 
to  make  use  of.  This  is  the  interaction  model 
we  have  chosen  for  our  implementation  of 
DIVERSE.  A  consequence  of  machine  use  of 
a  single  interlocutor  is  that  the  system’s  lin¬ 
guistic  competence  can  be  modelled  in  this 
agent  through  its  visual  characteristics,  its 
gestures,  its  language,  and  so  on  -  this  will 
encourage  convergence  in  one  direction.  Ac¬ 
cordingly,  the  DIVERSE  agent  has  been  pro¬ 
vided  with  a  simple  vocabulary  and  a  small 
set  of  gestures. 

Reference  resolution  —  pragmat¬ 
ics 


cess  to  the  discourse  and  other  aspects  of  the 
situation  that  the  language  use  occurrs  in 
are  usually  necessary.  Brown  and  Yule,  e.g., 
(1983)  mention  several  approaches  involving 
multiple  knowledge  sources;  an  implemen¬ 
tation  by  LuperFoy  (1991)  lists  nine  differ¬ 
ent  sources  her  algorithms  utilize,  including 
Recency,  Glob5d  Focus,  various  grammati¬ 
cal  and  lexical  features,  and  some  knowledge 
oriented  features. 

The  knowledge  sources  used  in  the  var¬ 
ious  approaches  can  roughly  be  categorized 
into  two  types:  1)  situation  specific  features: 
recency,  focus,  and  formal  features  of  the 
referring  expression;  and  2)  encyclopaedic 
features,  involving  different  kinds  of  world 
knowledge. 

In  DIVERSE  we  only  have  partial  ency¬ 
clopaedic  information.  We  have  full  knowl- 


One  of  the  most  challenging  problems  of  lan¬ 
guage  understanding  is  that  of  reference  res¬ 
olution:  of  tracking  what  referents  referen¬ 
tial  expressions  refer  to. 

We  are  not  even  sure  of  what  the  charac¬ 
teristics  of  referents  are:  we  have  reasonable 
evidence  from  text  studies  that  referring  ex¬ 
pressions  in  the  text  do  not  refer  directly 
to  other  expressions  in  the  text  itself,  but 
to  referents  outside  it  (e.g.  Brown  fc  Yule, 
1983);  similarly  we  have  reasonable  evidence 
that  referring  expressions  do  not  refer  di¬ 
rectly  to  the  “world”,  “knowledge  base”  or 
whatever  we  posit  be  the  “reality”  that  the 
discourse  is  “about”,  but  to  some  intermedi¬ 
ate  level,  which  we  in  the  following  will  call 
discourse  referents.  We  will  make  no  claims 
about  the  characteristics  of  such  referents: 
in  our  implementation,  with  the  exceedingly 
simple  task  and  object  structure,  we  have 
yet  had  no  need  to  implement  an  intermedi¬ 
ate  level.  Our  operations  apply  directly  to 
the  world. 

Resolving  which  discourse  referent  a 
speaker  or  writer  refers  to  is  non-trivial:  usu¬ 
ally  there  are  several  possible  candidates.  In 
the  general  case,  knowledge  of  the  domain 
in  addition  to  syntactic  information  and  ac¬ 


edge  of  what  objects  exist  in  the  world,  and 
we  have  a  certain  hierarchical  organization 
of  objects  with  subparts,  but  there  is  no 
representation  of  object  relations,  roles,  and 
world  characteristics.  What  we  do  have  is  an 
excellent  representation  for  discourse  track¬ 
ing,  to  analyze  focus. 


To  concretize,  the  problem  we  need  to 
solve  is  that  of  resolving  what  the  referring 
expression  “the  house”  in  the  user  utterance 
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“Paint  the  house  black.”  refers  to  as  in 
figure  6,  what  the  referring  expression  “it” 
refers  to  in  the  user  utterance  “Move  it  to 
the  left.”  as  in  figure  7,  and  what  the  re¬ 
ferring  expression  “a  cube”  refers  to  in  the 
user  utterance  “Take  me  to  a  cube.”  as  in 
figure  8. 

The  attentionail  state  of  the  system  is  eas¬ 
ily  modeled  by  visual  focus  and  the  high¬ 
lighting  mechanism  of  DIVE;  this  means 
that  where  a  pure  text  based  system  might 
have  to  deliberate  about  different  candidate 
cubes  —  given  that  there  are  several  in  the 
world  -  in  the  example  interaction  in  figure  7 
a  multimodal  system  will  have  a  less  vague 
situation  with  the  pictorial  accompaniment 
shown  in  figure  8. 

Move  me  close  to  a  cube. 

Figure  7:  What  does  “a  cube”  refer  to? 

We  give  each  object  in  the  world  a  focus 
grade,  based  on  recent  mention,  highlighted¬ 
ness,  gestural  manipulation  by  the  user,  and 
above  all,  visual  awareness.  Definite  noun 
phrases  and  pronouns  are  handled  similarly, 
for  now:  the  set  of  candidate  referents  is  con¬ 
strained  by  focus  grade,  and  the  candidate 
with  the  highest  focus  grade  is  chosen  as  a 
referent;  for  indefinite  noun  phrases  the  en¬ 
tire  world  is  allowed  as  candidate  set. 

So,  primarily,  if  an  object  is  in  the  per¬ 
ceptual  focus  in  the  virtual  environment,  i.e. 
the  agent  has  a  high  degree  of  awareness  of 
it  (Benford  and  Fahlen,  1993;  Benford  et  al, 
1994)  it  is  a  prime  candidate  for  reference. 

One  of  the  actions  available  to  users  is  to 
select  or  point  at  an  object.  An  object  which 
the  user  points  at  gets  a  high  focus  grade, 
with  a  rapid  rate  of  focus  decline  after  the 
pointing  gesture  has  been  completed.  The 
command  “Select  ohjeciV^  or  even  just  “06- 
jccf!”  highlights  the  object.  This  is  intended 
to  be  a  method  for  users  to  pick  out  referents 
before  issuing  commands  that  process  them. 
In  addition,  objects  can  also  be  highlighted 


2is  a  consequence  of  a  previous  command. 

Thirdly,  we  keep  track  of  which  objects 
have  been  referred  to  recently.  If  an  object 
is  in  the  textual  discourse  focus,  i.  e.  in  the 
recent  dialog  history  it  is  a  strong  candidate 
for  reference.  An  important  design  issue  is 
how  the  dialog  history  is  represented.  To 
encourage  users  to  refer  to  previously  men¬ 
tioned  or  manipulated  objects,  the  discourse 
history  can  be  made  explicit:  presumably 
the  representations  of  likely  candidates  for 
reference  will  influence  the  actual  references 
made.  This  must  be  studied  empirically, 
with  various  varieties  of  DIVERSE  imple¬ 
mentations  being  compared  to  one  another. 
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Figure  8:  Now,  what  does  “a  cube”  refer  to? 

The  evidence  from  gestures,  awareness 
status,  previous  commands,  and  discourse 
history  is  added  together  to  determine  which 
object  is  the  one  most  likely  to  have  been 
referred  to.  We  have  not  yet  determined  ex¬ 
actly  what  the  respective  weighing  of  these 
methods  will  be:  this  is  partly  an  empirical 
question. 

Typical  problems  for  text  based  reference 
studies  are  that  the  prototypical  case,  where 
a  definite  noun  phrase  refers  to  previously 
introduced  referents  and  indefinites  intro¬ 
duce  new  referents,  is  not  that  frequent 
(Fraurud,  1990).  Thus,  any  algorithm  for 
finding  a  referent  for  a  definite  noun  phrase 
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will  need  a  fair  amount  of  world  knowledge 
to  pick  a  contextual  sponsor  or  anchor  for 
the  referential  expression.  We  have  found 
that  the  visual  awareness  factor  overrides 
the  importance  of  most  other  channels,  so 
that  in  an  interaction,  objects  can  be  intro¬ 
duced  as  salient  just  by  looking  at  them. 
If  the  user  moves  to  look  at  a  tree,  and 
then  says  “Move  the  tree  to  the  left.”  it 
is  clear  which  tree  is  meant.  And,  if  the  vi¬ 
sual  awareness  is  given  priority  over  other 
sources,  the  feedback  given  the  users  will  al¬ 
ways  give  users  information  of  what  is  going 
on. 

A  typical  view  of  the  drawbacks  of  natural 
language  as  an  interface  tool,  be  it  keyboard 
entered  or  spoken,  compared  with  direct  ma- 
nipulation  is  given  by  Cohen  (1992):  “...  an¬ 
other  disadvantage  [of  natural  language  in¬ 
put]  is  that  reference  resolution  algorithms 
do  not  always  supply  the  correct  answer  in 
part  because  systems  have  underdeveloped 
knowledge  bases,  and  in  part  because  the 
system  has  little  access  to  the  discourse  sit¬ 
uation  the  user  finds  himself  in,  even  if  ike 
sysiem^s  prior  utterances  and  graphical  pre¬ 
sentations  have  created  that  discourse  situa¬ 
tion,  ...  These  ...  world  knowledge  limita¬ 
tions  undermine  the  search  for  referents  of 
anaphoric  expressions  and  provide  another 
reason  that  natural  language  systems  are 
usually  designed  to  confirm  their  interpre¬ 
tations.” 

Bos  ei  al  have  implemented  EDWARD,  a 
text  and  direct  manipulation  operating  sys¬ 
tem  for  workstations  (1994).  They  note  that 
users  sometimes  lose  track  of  selected  ob¬ 
jects:  “we  found  ...  users  not  always  being 
aware  of  the  state  of  the  model  world:  the 
markedness  of  objects  selected  a  while  ago 
was  sometimes  forgotten  or  overlooked.”  In 
DIVERSE  we  may  be  able  to  expect  slightly 
better  user  attention  —  visual  awareness  is 
much  better  determined;  the  view  is  fixed 
in  EDWARD,  whereas  the  user  can  change 
the  view  in  DIVERSE,  and  as  the  visual  fo¬ 
cus  overrides  selection  and  highlighting  of 
objects,  a  DIVERSE  user  can  be  expected 


to  be  more  aware  of  the  state  of  the  model 
world  and  markedness  of  objects.  Whatever 
the  case  may  be  on  that  count,  Bos  ei  a/ note 
that  the  mistakes  the  system  makes  do  not 
seem  to  faze  users;  the  errors  are  interactive 
enough  for  the  user  to  accept  them.  Thus 
they  partly  answer  Cohen’s  objections:  in 
a  highly  interactive  environment,  errors  do 
not  matter;  at  least  if  the  interface  is  honest 
about  its  abilities  and  cooperative  as  to  dis¬ 
playing  them.  In  our  design,  feedback  is  not 
a  matter  of  asking  the  user  for  confirmation, 
but  a  view  of  system  actions. 

Errors  do  not  matter 

The  interactive  design  of  the  DIVERSE  in¬ 
terface  is  related  to  recent  trends  in  natu¬ 
ral  language  interface  research,  where  the 
underlying  problem  of  interactive  interfaces, 
especially  natural  language  interfaces,  today 
is  identified  as  that  of  a  low  degree  of  interac¬ 
tivity  or  “one-shot” -interaction,  where  users 
believe  -  regardless  of  system  competence  - 
that  systems  expect  them  to  pose  queries  in 
one  go  (Bretan  and  Karlgren,  1993). 

The  conversational  competence  users  ex¬ 
pect  from  computers  is  extremely  simple, 
which  has  been  shown  in  a  number  of  studies 
of  natural  language  interfaces.  This  is  specif¬ 
ically  true  for  discourse  structure,  which  has 
been  shown  to  be  modellable  by  an  exceed¬ 
ingly  simple  dialog  grammar,  by  examin¬ 
ing  the  discourse  structure  of  material  ob¬ 
tained  in  Wizard  of  Oz  simulation  studies 
(Dahlback,  Jonsson,  and  Ahrenberg,  1993). 
This  can  be  explained  by  a  fundamental 
asymmetry  of  beliefs  between  user  and  sys¬ 
tem  (Joshi,  1982).  Users  do  not  expect  com¬ 
puter  systems  to  take  responsibility  for  the 
coherence  of  a  discourse,  but  expect  to  take 
full  responsibility  for  the  discourse  manage¬ 
ment  themselves.  This  is  in  contrast  with 
naturally  occurring  dialog  which  is  not  only 
interactive  but  also  incremental^  i.e.  in  a 
form  where  both  parties  cooperatively  build 
up  referents  and  references  during  the  course 
of  a  discourse. 
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To  change  this,  the  system  must  somehow 
display  and  make  explicit  what  information 
it  has  for  the  user  to  refer  to,  and  what  as¬ 
sumptions  about  user  intentions  it  makes; 
at  the  current  point  of  sophistication,  a  high 
degree  of  interactivity  and  added  communi¬ 
cation  channels  to  the  system  is  arguably  a 
better  tool  for  raising  system  usefulness  than 
adding  functionality  or  intelligence  to  the  ex¬ 
isting  channel,  be  it  text,  speech,  or  a  rule 
based  system  (Chandrasekar  and  Ramani, 
1989;  Lemaire  and  Moore,  1994;  Karlgren 
et  al,  1994). 

As  indicated  in  the  previous  section,  in 
DIVERSE  we  make  use  of  the  errors-do-not- 
matter  principle  to  the  extent  that  we  will 
not  worry  about  the  system  misinterpret¬ 
ing  the  occasional  user  utterance:  as  long  as 
the  interface  is  interactive  we  do  not  expect 
misinterpretations  to  bee  too  crucial  a  prob¬ 
lem.  More  important  than  error  handling  is 
a  broad  acceptance  of  user  utterances:  every 
utterance  should  produce  some  effect. 

The  representation  of  the  utterance  is 
matched  to  representations  of  possible  ac¬ 
tions  in  the  domain.  If  no  good  match  is 
found,  any  referents  that  have  been  iden¬ 
tified  in  the  utterance  are  highlighted  any¬ 
way,  to  facilitate  users  to  continue  the  dis¬ 
course,  rather  than  starting  from  square  one 
again.  This  is  similar  to  recent  ideas  about 
how  to  generally  design  a  natural  language 
interface,  using  “non-threatening  error  mes¬ 
sages  that  reiterate  vocabulary  and  phrases 
the  processor  understands.”  (Zoltan-Ford, 
1991). 

Conclusions 

Language  is  not  only  about  conveying  in¬ 
formation^:  it  is  a  tool  for  acting  in  the 
world.  Without  immediacy  with  respect  to 
the  world  it  is  used  in,  it  is  not  natural  lan¬ 
guage.  Conversely,  VR  interaction  without 
language  does  not  take  place  in  a  natural 

^In  fact,  £is  an  experiment,  the  reader  is  invited  to 
approximate  how  large  a  percentage  of  language  use 
the  reader  personaUy  uses  for  conveying  information. 


or  intuitive  world.  We  are  working  on  over¬ 
coming  some  of  the  most  fundamental  weak¬ 
nesses  of  these  two  areas  of  interactive  sys¬ 
tem  design  -  through  merging  them. 
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ABSTRACT 


browse  a  computer  generated  ^  interface  to 

the  interface,  metaphors  of  reDrp<?(»ni-Q?*  commands.  More  than 

judged  by  the  user.  WiS  Siem^^p^^?  f  and  action  are 

and  the  usage  of  the  svstpm  knowledge  of  the  goal 

metaphors  m*e  fundai^Sal*  S  thKn?  ch^acteristics  of  these 
model  and  the  task^o  be  a^e  cental 

presence  quality  amplify  this  reSS?enJ^^^  “°®tly  the 

INTRODUCTION 

thought  practice^°8].  ^^Even^ay^lS^uaffe^^h^  activity  but  a  human 
speech,  paintina  a  talk  ^^uage  abounds  in  illustrating 

PoIIUcianJ  S  sidS^wren  •'"‘P  understaSdtog® 

Sportsmen  struggle  for  rtctom  and  JtTon  ^  arroios  or  blows. 
arguments  is  unpleasant  See  a  ??a?  k?"  t)y  lack  of 

others  have  their  pJcT^iA  ^  w'Jnn  allegories  and 

between  rational  knowledge  and  StemrefatiJn  '^'  ^ 

ssjrs!i,"i«TsCSsH"n^'"“ 

IS  a  factor  that  limits  indenpnriJ^^tf^^f  iack  or  slow  growth 

other  domain.  ^  ^''f'yday  life  or  in 

become  more  and  more  abstract  knowWge,  these  models 

users  « physical »  reDrespntafi^^^l/^“  ^ore  effective.  The  naive 
processor  iid  disk  mS  fo?^“£,P  “W"  a  set  of  mem?^^ 

ysualmeg^f^‘g^|’„f®='„°//^fcv^e  packages  present  a 
d^ect  extension  of  the  thotS  the  be  a 

of  a  mental  model,  presenting  the  sta^tp^n?  Jhe  elaboration 

means  of  movement  or  acti^^TlTe  litSa^  suggesting 

the  use  of  a  linguistic  categorv  and  ^  metaphor  is 

one.  In  computer  pidgin  it  means^tn  another 

specific  data  or  of  the’eonmu^r  I  the  internal  state  of 
common  and  more  suitS  to  ^  naXSe  mf tr^slation  more 

or  what  is 

other1°”?^®?e”l^a?42r,^tS‘<>P  worS'S'the  metaphor  whUe 
regrouping  aU  actions  o/thinS  not^nteI^^^^°+^  ^  expression 
aspect  is  Often  neglected‘^«rs°o‘meX^3^ ‘?eo‘^LTe« 


remainder  of  to  metaphor  g^'^pfSr  wher’Twe 

“metimes  leave  the  ^.^'"^ter  “O^e  to  sm^  me 

by  the  haiidbooks  and  muatrated  by  ^e^t^aw  m  to  ^ 

top  ”s'^nl“n  ^d  Itooi  to  traduce  the  will  of  Interpretation, 

navigation  or  modification.  x  j.  u 

their  structural  rattens  mwetoan  by  toen  ph^^ 

SilSlS^^lS13i5S! 

SESSS^S^cSiSSe 

^S!mhltinr^Tus^ge  bufSil 

S^iSS^S^SeHisidv^^es  iike^ 

SilieSSn^WgemeS  f|'’"b"Vi;^u?l  “reality  W),  a  natural 
interaction  is  one  of  our  primary  wishes. 

Tinta  vir+iial  reality  is  at  the  crossroads  of  3D  grapMcs. 

he  Si  slgM  on  the  .  real  •  world  and  when  he  is  unphed  m  his  task 
itls^^men^y.  interaction  and  navigation  For 

imme^sfve^'^r  ^of  Ippucauon.  ^ 

”“%iS”pSs?nS‘?i^fri?nto  o“f-some  category  of  toetapto  of 

navigation^has  already  Droved  that  some  metaphors  ^e 

fnr  ifn  flnnlications  For  example,  simulating  an  object  taKen  ^ 


and  more  complex  if  the  software  needs  to  manipulate  some  of  its 
graphical  objects,  alternatives  of  navigation  or  action  are  sometimes 
the  least  worst  solution.  Each  kind  of  activity  like  game,  scientific 
visualization,  architecture  has  its  own  particularities  about 
navigation,  immersion,  ejqploration  and  interaction.  A  global  and 
common  solution  has  never  existed,  we  must  limit  our  goal  to  find  a 
general  way  to  solve  the  terms  of  a  problem. 

These  reasons  motivate  a  will  to  rationalise  the  choice  of  a 
metaphor,  a  process  often  empirical  and  not  well  described.  With  a 
good  metaphor  the  learning  is  simpler,  the  use  quicker  and  more 
implicit  because  the  cognitive  system  relies  more  on  the  perceptive 
and  motor  system,  tiie  imagination  is  then  unleashed.  Using  3D  new 
presentations  like  MagicCap  for  a  desktop,  a  modified  FSN  to  control 
the  Jurassic  Park  Centre  [24]  or  the  navigations  tool  « Information 
Visualizer »  from  XEROX  may  be  some  kind  of  solution  for  problems  of 
visualization.  The  selection,  action,  representation,  navigation  and 
dialogue  with  the  metaworld  are  as  many  different  needs  sometimes 
conflicting  to  resolve  in  order  to  obtain  a  satisfactory  final  interface. 

We  present  here  a  methodology  helping  in  the  conception  and 
integration  and  test  of  metaphors  more  aimed  for  virtual  reality.  We 
define  a  list  of  points  of  interest  and  for  each  one  general  criteria  of 
judgement. 

CASE  STUDY 

In  the  goal  of  illustrating  our  subject,  we  will  examine  the 
problem  of  representation  of  the  file  system  of  a  workstation.  A 
computer  user,  advanced  or  not  must  be  able  to  explore  and  to  watch 
over  a  file  hierarchy,  to  evaluate  disk  usage,  to  find  out  new  files,  to 
visualize  or  to  execute  them. 

STAGES  IN  THE  CONCEPTION  OF  A  METAPHOR 

Ftmctional  definition 

Amongst  innovations,  we  must  identify  which  ones  should  be 
conveyed  to  the  user,  specially  spirit  of  the  program  and  modus 
operandL  The  main  questions  expressed  by  the  operator  are  well 
known  [22].  The  designer  should  be  aware  of  their  nature,  then  he  will 
try  to  list  some  pertinent  of  them  by  advance  and  to  propose  some 
corresponding  possible  answers,  some  interesting  first  approaches  will 
always  be  present. 

Type  of  question 

1)  Goal  oriented 

2)  Descriptive 

3)  Procedural 

4)  Interpretative 

5)  NavigationalCanonical  form  of  the  question 

1)  What  kind  of  things  can  I  do  with  this  program  ? 

2)  What  is  it  ?  What  does  it  do  ? 

3)  How  does  it  do  this  ? 

4)  Why  has  this  happened  ?  What  does  it  mean  ? 

5)  Where  am  I  TThese  questions  imply  which  kind  of  clues  we 
have  to  give  to  the  user.  There  are  different  models  too,  near  if 
possible  that  must  be  developed  such  : 

a)  the  targets  system  (discovered  by  the  user)  ; 


b)  tlie  conceptual  model  of  the  target  system  (taught  to  the  user)  • 

c)  the  systems  image  (the  feeling  of  the  user)  ; 

\  mental  model  that  the  user  has  of  the  target  system  * 
e)  the  conceiver's  model  of  the  users  model. 

Here,  some  important  points  are  to  show  the  hierarchical  nature  of 
our  Jm  ^stem,  the  notion  of  name,  location,  size,  owner,  age  and  the 
possibility  to  consult  execute  or  search  some  file.  The  user  must  be  helved 
and  encouraged  to  browse  theJUe  system  and  to  locate  himself  in  it 

Participatory  design 

.  element  because  a  lot  of  misunderstandings  or 

potenhm  diiiiculties  encountered  by  a  user  may  be  noticed  earlier  if 
some  toe  is  spent  discussing  with  users  about  their  specific 
knowledge  ^d  understanding,  or  testing  in  each  stage  the  vafiditv  of 
possible  solutions.  The  designer  is  the  lone  responsible  for  users 
mistakes,  so  usmg  prototypes  and  benches,  observing  users  testing  a 
so^are  or  explaining  themselves  its  use  is  an  economic  and  pro4n 
but  not  sure  way  to  limit  ulterior  disappointments.  Most  of  the  toe 
jobs  are  associated  ^th  co-operative  tasks,  precise  toes  or  places  • 
workers  are  aware  of  it  and  can  give  some  useful  clues. 

The  notion  of  hierarchy  and  the  orientation  may  be  a  possible 
confusing  point  to  a  user  not  computer  very  literate,  as  the  rights  m  access 
to  some  files. 

First  considerations 

imtice  that  the  choice  of  a- metaphor  is  partly  implicit. 

stylistic  devices  which  expose  relations 
^tween  h^dled  entities,  informations  fiows,  possibilities  of  action  on 
torn  pd  &em  reactions  their  organization...  The  problems 
encountered  by  the  techmcal  people  are  less  pertinent  here  than  the 

ergonomes  duty  to  highlight  these  points  and  to 
vocabu^.  The  choice  of  words  is  here  very 
portant,  for  example  links  and  pipes  have  a  different  meaning 
about  direchonality  (8].  However,  the  context  is  very  different 
°ll£"  appucaaon  (game.  vlLSS 

often  their  own  metaphors  they  teach 
thernselves  during  the  discovery  of  a  job,  they  explto  toe.  space  or 
relations  related  problems.  Each  ktad  of  probSn  habits  pSSa?^ 
mter^on  for  games,  navigation  for  visuafizatiori, 

choice  of  the  metaphor  of  representation 
in  the  next  stage  will  limit  the  other  ones. 

DIFFERENT  METAPHORS 
Metaphors  of  representation 

suggestions  exist  about  the  conception  of 
SSS?  “lodcls.  we  have  natural  ways  to  generate  metaphors  [25]  : 

capping,  coherence,  vocabulary, 
dmension^ty  of  the  problem,  grammar,  usage.  Already  aforesaid  mu* 

hi  characteristics  or  relations  of  objects 

real  world.  The  spatial  dimensions  should  reflect  the 
Semitic  dimension  of  the  task  that  users  are  confronted  to  It  is  not 
Fnt? f  informations  and  their  relations  Se  corivertble 

known  geometric,  organizational  or  architectural  ordering  It 

techniques  as  data  are 
^  diagram  entities-relations  as  for  a  database 
^d  an  object  model  may  give  some  clues  before  looking  in  the  users 
?£  ££  experience  for  possible  car  didates  to  TmeLX? 

We  have  in  our  case  study  an  example  of  strict  hierarchy  with  of  lot  of 


file.  So  shapes,  colours,  symbols,  sounds  may  be  good  candidates  to 
classify  things. 

Here,  the  JUe  structure  is  strongly  and  hierarchically  organised.  We 
find  this  proper^  for  example  in  our  Jobs,  our  towns  or  in  a  library.  A 
working  people  may  exercise  some  tasks  in  the  same  time,  may  be  it  is  an 
opportunity  to  represent  the  link  notion  but  an  administration  is  rarely 
modified  a  lot  A  city  may  Ulustrate  a  kind  of  hierarchy  too.  style  and  size 
of  buUdings  giving  clues  about  age  and  size.  It  can  offer  a  global  s^ht  to 
Ais  ordering,  an  unique  opportunity.  This  metaphor  has  been  retailed  in 
the « File  System  Navigator » [24]  for  this  reason.  At  last  a  library  shows  a 
good  notion  of  document  with  different  criteria  a  research  but  here  it  would 
be  a  richer  library  with  text  video  and  other  documents.  Age  or  size  may 
find  a  good  fit  with  different  colours  of  paper  and  with  different  thickness 
of  books,  the  library  seems  the  more  convenient  solutions. 

Metaphors  of  selection  and  action 

Selection.  Voice,  mouse,  magical  wand  or  any  peripheral  has  its 
own  advantages.  The  multimodal  input  is  rich,  redundant,  attractive 
but  far-off.  Objects  must  show  their  selectionability  by  sight,  sound  or 
touch.  The  trend  to  give  collectors  and  methods  to  objects  will  allow 
this  dialogue  [23].  Representing  things  in  3D  space  and  without  force 
feedback  is  a  breach  to  security  and  confirmation.  The  comparison 
between  open  and  closed  loops,  i.e,  with  tactile  or  force  feedback, 
shows  a  50%  gain  in  time  performance  [11].  It  must  be  remembered 
that  man  is  polyvalent  enough  to  replace  a  sense  by  another,  like 
touch  by  sound. 

Grammar  of  action.  Two  syntaxes  are  opposed,  giving  an  object 
then  the  action  used  with  or  the  opposite.  The  direct  manipulation 
chooses  the  order  object-action  but  natural  activities  make  us  take  a 
tool  then  use  it  on  the  object.  Choosing  an  object  has  the  advantage  to 
limit  the  choice  of  tools  and  inversely.  Current  actions  are  often  two- 
handed  but  few  people  have  studied  an  equivalent  in  computer 
dialogue  [2].  The  direct  manipulation  is  inadequate  to  order  a  general 
action,  an  example  being  given  [6]. 

Indication  of  possible  actions.  Appearance  of  the  different  objects 
of  a  scene  and  details  of  rendering  are  the  first  means  to  distinguish 
potential  actors  and  decoration.  Visual  organization  is  another,  a  third 
being  an  event  emitted  by  the  object  like  colour  or  sound  when  it  is  in 
the  focus  of  an  action.,  the  graphical  pointer  can  change  too. 
Ergonomics  state  that  the  shape  indicates  the  function  and  the  means 
of  action  [18]  but  by  fortunately  most  of  models  do  not  require 
interactions  [5]. 

For  us,  some  documents  (files)  can  not  be  seen  by  a  user  (access 
rights).  The  visual  organization  can  show  it  some  books  being  too  high  in 
the  shelves  to  be  taken,  behind  a  glass  or  forbidden  to  reading  because 
they  have  a  coloured  pastOle.  Actions  tiave  the  limits  aforesaid  Le. 
searchbig  with  index  or  with  the  help  of  writings  on  the  edge  of  books 
(author,  rights,  size...),  consulting  text  or  movies  while  placing  the 
document  on  a  lectern.  Here  the  direct  manipulation  seems  useful  and  is  a 
natural  way  to  do  different  actions. 

Variety  and  representation  of  actions.  Actions  are  shown  by  tools 
panels  or  multishapped  cursors  [2],  [17].  Handled  objects  may  be 
helpful,  giving  some  help  or  clues  about  potential  actions  as  a 
graphical  handle  signifying  possible  displacement.  In  an  ideal  case, 
each  modification  of  the  objects  internal  state  should  be  reflected 
outside,  be  distinguishable  and  be  reversible  as  far  as  possible  [10], 
The  tools  scale  of  sensibility  must  be  fitted  to  the  needed  scale  of 
action. 

The  hand  is  our  usual  tool  to  consult  documents.  We  do  not  have  a 
real  need  for  a  peripheral  with  six  degrees  of  freedom  because  the  books 


in  a  shelf  ewe  edways  in  the  same  vertical  plane,  a  mouse  may  be 
sufficient.  When  the  hand  is  near  to  a  book,  the  ind^  may  be  pointed  to 
help  the  reading  of  the  edge.  To  take  a  book,  a  button  maintained  pressed 
as  long  as  needed  is  a  good  alternative  to  the  rendering  of  the  weight  of 
the  book.  Some  pertinent  sounds  (opening,  taking...}  wOl  nHH  some 
realism. 

Metaphors  of  creation 

Tools  palette,  modeller.  Often,  blank  objects  are  presented  to  add 
them  new  components  or  to  modify  them  with  a  tools  palette.  This 
case  is  qioite  similar  to  the  precedent  one  about  action.  Sculpting 
shapes  in  a  v^ual  world  is  a  quite  unexplored  for  the  moment. 

Duplication  of  existing  objects.  Unusual  in  real  life,  to  have  a 
signification,  we  must  dispose  of  the  metaphor  of  a  duplicating 
machine,  but  to  photocopy  is  not  logical  for  each  metaphor.  Generally, 
the  creation  is  an  extension  of  manipulation  and  leaving  the  virtual 
world  for  the  metaworld  is  often  mandatoiy. 

Here  we  have  no  need  to  create  new  books  but  if  necessary  we 
could  add  some  blank  books  that  could  be  filled  writing,  speaking, 
duplicating... 

Metaphors  of  navigation 

Completing  the  representation  metaphor,  this  one  is  justified  as 
soon  as  the  quantity,  nature  or  appearance  of  data  is  restricted  by  the 
display  peripherals  or  if  different  view  angles  allow  a  better 
understanding.  The  right  metaphor  is  the  one  leading  to  explore  data. 

Point  of  view.  Ideally,  the  camera  should  be  enslaved  to  the 
head,  the  body  being  its  own  vehicle,  but  the  head  is  not  very  mobile. 
Seeing  through  someone  elses  eyes  is  a  frequent  cause  of  sickness  or 
nausea.  The  world  in  the  hand  technique  is  psychologically  better 
adapted  to  fist  sized  worlds.  The  eye  in  the  hand  way  is  precise  too 
but  can  become  uncomfortable  quickly.  Global  performance  are  very 
changeable  and  depends  on  the  global  context. 

Relocalization  and  orientation.  Visual  clues  like  different  angular 
speed,  occulting  elements,  having  a  global  survey  or  judging  distances 
by  rotating  the  camera  are  less  costly  and  more  important  ways  of 
interpreting  3D  relative  positions  than  real  stereoscopy.  With  his 
innate  sense  of  orientation  and  some  contextual  indications  disposed 
in  key  points,  the  operator  is  able  to  build  his  mental  map  of  the 
rirtual  world.  Techniques  used  by  architects  in  modem  buildings  can 
be  l^lpful  for  the  virtual  world  designer  [19].  He  has  the  responsibility 
to  fix  an  arbitrary  arrangement  if  none  exists.  If  the  spatial 
organization  is  complex  or  at  least  in  the  first  journeys  in  a  virtual 
world,  a  graphical  map  has  to  be  displayed  [5|. 

Choice  and  reach  of  a  destination.  A  way  may  be  accomplished  by 
teleportation,  the  destination  being  specified  or  a  portal,  hidden  or  not 
being  stepped  over  [23].  Instead,  the  direction  and  target  of  smooth 
movement  can  be  given,  which  is  a  practical  solution.  This  may  be 
specified  by  sights  or  hands  direction,  by  a  device  or  along  a  privileged 
(mection  [14].  The  movement  tied  to  the  sights  direction  is  probably 
the  better  for  learners.  The  last  way  of  giving  a  direction  is  the  use  of  a 
physical  activity  like  walk  or  cycling  where  an  implicit  direction  is 
given  [7].  Here  too,  the  choice  of  a  technique  is  strong  tied  to  the 
nature  of  the  exploration  and  the  environment. 

Speed  control.  Instantaneous  movement  is  the  privilege  of 
te^portation,  but  on  the  other  hand  each  time  a  mental  map  has  to  be 
^  reorientation.  Adopting  a  speed  vaiying  logarithmically  with 
die  distance  which  remains  is  a  very  comfortable  method,  well  suited 
for  short  or  long  movements  [16],  The  speed  may  be  bound  to  a 
physical  or  symbolic  action,  following  the  rhythm  of  some  device 


activity.  In  the  WalkThrough  simulator,  a  bike  allows  to  visit  a 
building,  the  wheels  being  braked  when  going  upstairs.  This  is  a  good 
example  where  a  closed  loop  exists  during  the  movement 

Constraints  on  the  movement.  Living  in  a  3D  world,  the  man 
builds  mostly  mental  maps  with  two  dimensions.  In  a  building,  we  are 
changing  floors  rather  than  of  altitude.  A  flight  interface  offers 
problems  of  co-ordination  amplified  by  narrow  lines  of  view  and  lack 
of  gravity  [28].  So,  an  artificial  sky-line  or  far  objects  are  needed  to 
help  our  peripheral  vision.  In  the  same  time,  nearer  and  displayed 
objects  boimd  to  us  like  hands  or  feet  limit  the  visual  stress  and 
lessen  potential  bad  orientations  effects  [4].  Out  of  reach  of  the  vulgum 
pecus,  the  control  of  6  degrees  of  liberty  in  space  has  to  be  limited  to  5 
or  less.  For  coherency,  visual  obstacles  must  make  the  movement 
physic^y  more  difficult  or  longer. 

Medium.  With  the  help  of  peripheral  devices,  inputs  may  be 
directions,  gestures  or  rh3ffiims  [21].  A  glove  used,  the  hand  gives 
direct,  symbolic  or  constrained  orders,  discreet  or  continuous  (27). 
However  common  to  pilot  vehicles,  using  the  feet  has  not  really  being 
studied,  just  in  one  case  as  a  potential  pointing  device  even  if  it  frees 
hands  or  give  a  multimod^  input  [20].  Some  actions  helping 
fulfilment  of  the  same  task  do  not  generate  confusions,  the 
performance  is  generally  better  and  the  presence  strengthened. 
Physically  implying,  the  walk  may  be  simulated,  limited  by  obstacles, 
staying  in  the  same  place  or  unlimited  with  the  aid  of  a  travelling 
band  [7].  Replicas  of  different  pilots  cockpits  give  consistent  interfaces 
but  generally  not  well  suited  for  actions  metaphors.  Virtual  cockpits 
are  useful  to  define  future  prototypes  but  often  appear  of  no  use  for 
real  driving,  at  least  by  lack  of  a  force-feedback  [26]. 

In  our  library,  the  natural  candidate  for  the  movement  is  the  walk,  at 
last  for  logical  and.  practical  reason.  To  help  the  explorer,  the  opening  of  a 
folder  may  be  translated  in  a  walk  to  new  shelves.  Instead  of  a  walk,  a 
teleportation  with  a  soft  transition  of  images  and  a  sound  of  steps  may 
signify  this  displacement,  a  map  helping  to  orientate  oneself  or  to  go  to  a 
precedent  location.  If  we  need  a  global  sight,  a  different  geographic 
organization  is  requested.  The  research  of  JUes  has  a  substitute  infilling  a 
research  card. 

Metaworld 

Disengagement  The  V.R.  faces  the  specific  question  of 
disengagement  when  being  immersed  in  the  virtual  world.  It  can  be  to 
access  to  a  keyboard  or  in  a  augmented  reality  system  to  see  just  the 
real  environment.  Prepared  for,  the  device  drivers  have  the 
responsibility  to  judge  this  intention  and  to  inform  the  main  program. 
Exterior  events  like  getting  too  close  to  a  wall  must  be  signified.  Here, 
the  real  world  enters  in  the  virtual  worlds. 

Inputs,  outputs,  exchanges.  The  definition  of  the  metaworld 
makes  this  word  represents  ^  which  is  not  the  application  itself.  In 
an  editor,  we  toggle  between  command  or  edit  modes.  Here  it  is  the 
same,  we  can  need  to  give  commands  not  possible  otherwise,  to 
modify  the  appearance  of  our  interface  or  to  exchange  documents  with 
the  outside.  The  ultimate  goal  should  be  to  not  have  to  let  the  inner 
side  of  a  metaphor  except  to  use  an  other  software.  The  clipboard  in  a 
Windows-like  environment  allows  to  exchange  data  but  most  of  the 
time  the  limits  of  the  metaphors  are  blurred  and  reconfiguration  or 
external  activities  are  lowly  consistent.  In  a  desktop  metaphor, 
modifying  the  colours  or  some  parameters  may  be  resolved  with  a 
paper  catalogue  of  desk  accessories  which  allow  the  reconfiguration 
without « leaving  »  the  context  of  the  desktop  environment.  Metaphors 


are  not  auto  inclusive  without  difficulties  but  some  tricks  can  mitigate 
the  consequences  of  this  limitation. 

In  OUT  library  metaphor,  the  graphic  representation  can  accept 
different  rooms  Jar  different  activities  and  a  smaR  wagon  Jor  exchanging 
data  [name  or  content  of  files)  with  external  acttuities.  Switching  Jobs 
implies  to  pass  doors  or  leaving  our  library  to  turn  off  fhe  light  switch. 

GLOBAL  COHERENCE 

Ergonomics.  This  stage  verifies  that  the  whole  actions  are  weU 
enumerated,  easy  to  learn  or  to  remind  and  easily  distinguishable 
except  in  case  of  strong  analogies. 

Scale  factors.  Time  lengths  and  delays,  distances  must  be  evident 
and  possibly  the  same  in  each  metaphor.  Real  or  differed  time,  in 
place,  distant  and  other  scale  actions  are  pertinent  informations  and 
should  be  reflected  clearly. 

Environment.  If  possible,  a  majority  of  the  objects  which  are 
present  in  the  real  world  that  the  metaphor  represents  must  be  here. 
The  virtual  world  will  offer  a  complete  picture  and  the  presence  will  be 
higher.  Fundamentally  different  concepts,  scales  of  space  or  time  that 
are  too  large,  objects  too  technologically  distant  are  real  obstacles  to 
the  global  cohereiice.  The  V.R.  needs  specially  a  excellent  coherence 
because  the  user  is  very  «  close  »  to  his  interface  and  can  be  rebuffed 
much  more  easily. 

The  initial  goals  are  fulfiRed  with  natural  instant  and  reduced 
interactions  with  familiar  objects  in  the  common  environment  of  a  libranj 
with  all  its  usual  features. 

CRITERIA  TO  EVALUATE  METAPHORS 

For  each  proposition,  we  have  to  measure  the  functionalities 
along  some  criteria  which  are  very  application-type  dependent.  Some 
criteria  proposed  elsewhere  were  Insufficient  and  not  satisfactory  for 
the  needs  of  V.R.  We  had  to  extend  or  modify  them  [8],  [12]. 

Coherence.  It  measures  the  clearness  of  actions  and  their 
consequences.  We  need  to  guess  an  internal  logic  and  strong  relations 
between  action  on  the  metaphor  and  internal  modifications  of  data. 

Presence.  When  the  operator  finds  everything  he  needs  in  the 
virtual  world,  if  all  his  senses  are  fed  and  work  together  in  a  coherent 
universe  then  this  ultimate  goal  is  reached,  the  virtual  world 
becoming  the  real  world. 

MaUeabMy.  It  is  possible  to  configure  differently  or  to  enrich  the 
mterface,  being  not  ordered  to  follow  a  unique  way  of  doing  but 
instead  being  able  to  define  ones  own  methods. 

Participation  and  interactivity.  As  a  real  life  action  which  uses  all 
our  senses  and  our  concentration,  a  metaphor  must  imply  full 
participation.  A  game-like  aspect  can  serve  the  interactivity.  Moreso, 
the  exchange  is  directed  to  human  being. 

Personality.  The  spirit,  aspect  or  other  qualities  of  a  program  may 
be  signed  with  a  stamped  signature  which  make  it  different  from 
others.  It  is  sometimes  a  criterion  of  choice  and  remembering. 

Adequation.  Nothing  is  worse  than  to  have  to  decrypt  the  usage  of 
a  tool.  A  metaphor  must  tell  us  what  we  are  doing  here  and  how  to  do 
it.  Helping  to  fulfil  a  work,  the  clearness  is  gratifying. 

Learning.  Any  metaphor  must  be  based  on  the  knowledge  of  an 
individual  being  accustomed  to  the  task  but  with  a  general  knowledge 
and  no  particularisms  be  they  cultural  or  technical.  Progressive  and 
pleasant,  the  learning  must  build  mental  models  that  are  more  and 
more  abstract. 


Representation.  Even  perfect,  a  program  must  adapt  itself  Avith 
the  limits  of  t±ie  current  technology.  For  example,  a  simple  but 
reactive  interaction  is  better  than  a  slow  but  detailed  one.  A  device 
driver  taking  to  account  the  means  of  rendering  should  adapt  the  real 
possibilities  of  the  software  to  those  of  the  hardware. 

Ergonomics,  Mentioned  more  precisely  here  for  V.R.,  we  must 
limit  the  physical,  visual  and  ment^  tiredness.  The  actual  conditions 
of  V.R.  are  more  restricting  than  anywhere  else. 

Usc^e  and  possibilities  of  evolution.  Recurring  operations  are  to 
lighten  or  to  banish.  A  good  metaphor  is  fully  ej^loited  and  adequate. 
More,  it  can  adapt  without  too  much  conditions  to  changes  or 
additional  request  to  the  terms  of  the  problem. 

This  Ubranj  style  interface  seems  adequate  to  most  of  the  initial 
terms  and  will  be  comfortable  if  the  interface  is  clean  and  smooth.  We  can 
ask  ourselves  if  this  metaphor  can  accept  additional  requirements  like  file 
rightss  modification,  files  move  or  deletion,  use  of  fUter  of  research  or 
display...  It  seems  that  a  lot  of  options  are  allowed  here,  the  metaphor  has 
stm  some  potential 

CONCLUSION 

A  global  survey  of  the  elaboration  of  our  library  metaphor  has 
highlighted  some  requisite  points  not  stated  in  the  first  terms  of  the 
problem.  The  need  for  a  common  metaphor,  for  criteria  and  short  cuts 
of  research,  for  possibilities  of  evolution  has  been  shown.  The  result 
shows  a  good  fulfilment  for  navigation,  selection  and  interaction.  The 
win  to  avoid  the  miss  of  orientation  or  to  allow  exchanges  with 
external  jobs  are  clear  now.  The  user  of  our  library  has  no  use  of  a 
metaworld  except  to  do  other  jobs.  There  is  no  distinction  between  two 
modes  like  «  editing  *  or  «  commands  »,  it  is  an  excellent  thing  because 
this  metaphor  is  self  sufficient. 

The  modus  operandi  discussed  above  alms  to  be  general  but 
complete,  we  believe  we  have  made  a  step  towards  our  goal.  This 
methodology  was  proposed  to  help  to  conceive  and  to  integrate 
metaphors.  It  is  better  suited  to  the  particular  nature  of  virtual  reality 
than  other  methodologies  because  different  needs  like  navigation, 
creation  or  others  are  evoked.  Along  these  stages  of  the  elements  of  a 
problem,  the  conceiver  gains  a  better  understanding  of  important 
points  that  he  want  to  communicate.  Nevertheless,  the  adoption  of  a 
program  and  its  interface  is  often  a  case  of  personal  taste.  To  palliate 
this,  a  participative  conception  all  along  this  work  is  essential  to 
prevent  misunderstandings,  easily  confusing  the  user. 

FUTURE  WORK 

In  order  to  characterise  further  action  and  the  judgement  a  priori 
or  a  posteriori  of  tiie  adequation  of  metaphors  with  statements,  the 
following  points  appear  important : 

-  measuring  and  defining  clearly  the  base  characteristics  of  the 
presence,  relations  between  feel  of  presence  and  qualitative  or 
quantitative  performance  of  a  user  ; 

-  studying  relationships  between  perceptual  space,  cognitive 
space  and  degrees  of  freedom  of  the  peripherals,  extending  of 
metaphors  to  more  than  three  dimension  problems  ; 

-  improving  in  feedback,  substituting  to  force  feedback,  using 
sound  and  voice. 
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1.  Introduction 

Years  of  research  into  hyper-media  systems  have  shown  that  finding  one’s  way  through 
large  electronic  information  systems  can  be  a  difficult  task.  Our  experiences  with  virtual 
re^ty  suggest  that  users  will  also  suffer  from  the  commonly  experienced  “lost  in  hyper¬ 
space”  problem  when  trying  to  navigate  virtual  environments. 

The  goal  of  our  paper  is  to  propose  and  demonstrate  a  technique  which  might  help 
overcome  this  problem.  Our  approach  is  based  upon  the  concept  of  legibility,  adapted  from  the 
discipline  of  city  planning.  The  legibility  of  an  urban  environment  refers  to  the  ease  with 
which  its  inhabitants  can  develop  a  cognitive  map  over  a  period  of  time  and  so  orientate 
themselves  within  it  and  navigate  through  it  CLynch60].  Research  into  this  topic  since  the 
1960s  has  argued  that,  by  carefully  designing  key  features  of  urban  environments  planners  can 
significantly  influence  their  legibility. 

Our  paper  proposes  that  these  legibility  features  might  be  adapted  and  applied  to  the 
design  of  virtual  environments  of  all  kinds  and  that,  when  combined  with  other  navigational 
aids  such  as  the  trails,  tours  and  signposts  of  the  hyper-media  world,  might  greatly  enhance 
people’s  ability  to  navigate  them.  In  particular,  the  primary  role  of  legibility  would  be  to  help 
users  to  navigate  more  easily  as  a  result  of  experiencing  a  world  for  some  time  (hence  the  idea 
of  building  a  cognitive  map).  Thus,  we  would  see  our  technique  being  of  most  benefit  when 
applied  to  long  term,  persistent  and  slowly  evolving  virtual  environments.  Furthermore,  we 
are  particularly  interested  in  the  automatic  application  of  legibility  techniques  to  information 
visualisations  as  opposed  to  their  relatively  straight  forward  application  to  simulations  of  the 
real- word.  Thus,  a  typical  future  application  of  our  work  might  be  in  enhancing  visualisations 
of  large  information  systems  such  the  World  Wfide  Web. 

Section  2  summarises  the  concept  of  legibility  as  used  in  the  domain  of  city  planning 
and  introduces  the  key  features  that  have  adapted  in  our  work.  Section  3  then  describes  a  set  of 
algorithms  for  the  automatic  creation  or  enhancement  of  these  features  within  virtual  data 
spaces.  Next,  section  4  presents  two  example  applications  based  on  two  different  kinds  of 
virtual  data  space.  Finally,  section  5  presents  some  initial  reflections  on  this  work  and 
discusses  the  next  steps  in  its  evolution. 

2.  What  is  legibility? 

legibility,  in  the  context  of  navigation  and  wayfinding,  is  a  term  which  has  been  used 
for  many  years  in  the  discipline  of  City  Planning.  Work  on  legibility  in  this  area  has  been 


concerned  with  the  way  in  which  people  are  able  to  ‘read’  an  environment  and  hence  perform 
wayfinding  tasks.  In  his  book  “The  Image  of  the  City”  [LynchfiO]  Kevin  Lynch  defines  the 
legibility  of  a  city  as:  .  .the  ease  with  which  its  parts  may  be  recognised  and  can  be  organised 
into  a  coherent  pattern. . .”  Here,  Lynch  is  referring  to  the  formation  of  a  cognitive  map  within 
the  persons  mind  [Passini92],  a  structure  which  is  an  internal  representation  of  an  environment 
which  its  inhabitants  use  as  a  reference  when  navigating  to  a  destination.  The  Image  of  the 
City  describes  experiments  carried  out  in  a  number  of  major  US  cities  which  show  how  the 
cognitive  map  is  built  up  over  time  through  experience  of  the  city.  The  experiments  involved 
obtaining  information  from  long  term  inhabitants  of  the  cities  in  the  form  of,  for  example, 
interviews,  written  descriptions  of  journeys  through  the  city  and  drawn  maps.  By  examining 
this  data  Lynch  identified  five  major  elements  of  urban  landscapes  which  are  identified  by  the 
inhabitants  and  used  as  the  building  blocks  of  the  cognitive  maps.  These  features  are: 

•  Landmarks.  Static  and  recognisable  objects  which  can  be  used 
to  give  a  sense  of  location  and  bearing 

•  Districts.  Sections  of  the  environment  which  have  a  distinct 
character  which  provides  coherence,  allowing  the  whole  to  be 
viewed  as  a  single  entity 

•  Paths.  Major  avenues  of  travel  through  the  environment  such  as 
major  roads  or  footpaths 

•  Nodes.  Important  points  of  interest  along  paths,  e.g.  road 
junctions  or  town  squares 

•  Edges.  Structures  or  features  providing  borders  to  districts  or 
linear  obstacles 

3.  Legibility  techniques  for  virtual  environments 

The  aim  of  ongoing  research  at  Nottingham  University  is  to  apply  the  work  described 
above  not  to  real  environments  but  to  the  artificial  spaces  of  virtual  reality  systems.  More 
specifically  we  are  developing  techniques  to  automatically  construct  the  five  legibility  features 
in  the  abstract  spaces  produced  by  data  visualisation  systems  such  as  database  or  document 
store  visualisers.  One  of  our  main  aims  is  to  accomplish  this  with  without  requiring  the  users 
of  the  system  to  perform  the  placement  of  the  features  manually.  Essentially  the  system 
should,  wherever  possible,  identify  and  place  the  features  using  information  available  from  the 
database  and  visualisation  systems  alone. 

To  do  this  we  have  constructed  a  prototype  system  called  LEADS  (LEgibility  for 
Abstract  Data  Spaces)  which  is  designed  to  provide  a  layer  on  top  of  existing  visualisation 
systems  and  which  performs  the  addition  of  legibility  information  to  the  space.  LEADS  acts 
on  the  position  of  data  items  provided  by  this  underlying  system,  as  well  as  accessing  the  raw 
data  where  necessary,  to  add  and  emphasise  the  objects  which  are  used  to  improve  the 
legibility  of  the  environment. 

LEADS  is  designed  to  be  applied  to  spaces  which  satisfy  three  main  criteria.  They 
should:  be  persistent  over  relatively  long  periods  of  time;  be  relatively  stable  so  that  they 
evolve  over  their  lifetime  and  are  rarely  disturbed  by  major  upheavals  in  the  database;  be 
accessed  repeatedly  by  a  number  of  independent  users.  An  example  of  a  large  space  to  which 
application  of  LEADS  techniques  might  be  appropriate  is  the  WWW  space,  which  is 
constantly  evolving  but  which  rarely  undergoes  global  scale  restructuring. 

To  place  the  legibility  features  LEADS  uses  districts  as  a  starting  point  as  this  allows 
for  a  number  of  relatively  simple  techniques  to  be  used  to  form  the  other  features.  Districts  in 
a  data  space  will  be  clusters  of  items  which  have  some  sort  of  internal  similarity.  LEADS 


applies  a  clustering  algorithm  to  the  data  to  identify  these  groups  automatically.  The  algorithm 
chosen  for  the  initial  implementation  is  Zhan’s  Minimum  Spanning  Tree  Algorithm  [7han7l], 
which  works  by  forming  a  minimal  spanning  tree  of  the  distances  between  the  data  items  and 
then  walking  the  tree  to  remove  links  which  are  significantly  longer  than  others  nearby.  The 
sub-trees  resulting  from  the  use  of  the  algorithms  each  form  a  district  in  the  space. 

Landmarks  need  to  be  placed  in  a  position  where  they  will  be  useful  for  navigation.  We 
have  used  a  premise  that  one  such  place  will  be  where  districts  are  dense.  The  fct  step  in 
positioning  a  landmark  is  to  find  groups  of  three  clusters  which  are  mutually  adjacent.  A 
landmark  will  be  placed  in  a  position  which  is  fairly  central  to  these  districts.  This  is  found  by 
finding  the  centroids  of  each  district  and  placing  the  landmark  at  the  centre  of  the  triangle  they 
form. 

Edges  in  LEADS  are  structures  which  help  to  define  the  borders  of  districts.  They  are 
placed  in  the  space  between  those  districts  which  are  significantly  large.  To  accomplish  this 
they  are  placed  between  the  nearest  neighbours  in  the  cluster  and  aligned  along  the  line  that 
joins  them.  In  most  cases,  especially  where  the  clusters  are  essentially  spherical,  this  results  in 
an  edge  placement  which  effectively  separates  the  space  but  does  not  cut  into  the  individual 
districts. 

Nodes  and  paths  are  co-dependent  in  the  LEADS  system.  Eventually  the  aim  is  to  have 
these  features  evolving  out  of  the  use  of  the  data  space.  The  initial  prototype  currently 
identifies  nearest  neighbours  between  districts  as  nodes  and  places  a  path  between  them. 
Within  districts  all  those  items  identified  as  nodes  are  joined  by  a  spanning  tree. 

4.  Two  example  applications 

We  have  so  far  applied  LEADS  to  the  visualisation  produced  by  two  existing 
information  visualisation  tools.  The  first  of  these,  Q-PIT.  is  a  systein'  which  works  on 
databases  where  the  items  have  a  number  of  well  defined  fields  Penford94].  Three  of  these 
fields  may  be  chosen  so  that  the  values  they  contain  are  mapped  onto  the  three  major  axes  of 
the  space  to  give  the  position  of  the  data  items.  The  remaining  fields  may  be  used  to  define 
aspects  of  the  representation  of  the  items  such  as  their  shape,  colour  or  speed  of  rotation. 

The  second  system,  Grapher,  is  a  3D  graph  drawing  tool.  This  takes  a  representation  of 
an  arbitrary  network  and  produces  a  3D  visualisation  by  representing  the  nodes  as  balls  and 
the  links  as  springs.  Initially  the  items  will  be  placed  randomly  and  the  system-  will  then  go 
through  a  cycle  of  repositioning  the  nodes  based  on  the  tension  in  the  springs  until  a  relatively 
stable  formation  is  found. 

LEADS  attempts  to  enhance  the  legibility  of  both  systems  through  the  above 
techniques.  Figure  1  shows  before  and  after  shots  from  Q-PIT.  Figure  2  shoes  before  and  after 
shots  from  Grapher^ 


5.  Reflections 

Our  initial  experiences  of  developing  and  testing  LEADS  have  been  largely  positive 
and  we  suspect  that  this  approach  does  have  the  potential  to  significantly  improve  the 
legibility  of  -virtual  environments.  However,  we  have  encountered  a  number  of  difficulties 
which  suggest  some  immediate  improvements  to  the  system: 

First,  legibility  features  such  as  paths  seem  to  sit  uncomfortably  with  6  degrees  of 


1.  Because  of  the  extensive  use  of  colour  in  LEADS  these  plates  can  not  fully  demonstrate  the 
effectiveness  of  the  enhancement.  We  hope  however  that  the  do  give  some  idea  of  the  effect  of  the 
addition  of  legibility  features. 


freedom  navigation.  Paths  in  a  city  are  actually  travelled  along  and  the  environment  is 
therefore  experienced  from  the  perspective  of  the  path.  We  suspect  that  users  of  our  legible 
virtual  environments  should  also  direcdy  experience  paths  as  part  of  navigation.  Thus,  we 
need  to  develop  simple  interface  techniques  to  suppon  navigation  via  paths  (e.g.  “take  me 
along  the  nearest  path  in  this  direction”). 

Second,  automatically  determining  an  appropriate  scale  and  appearance  for  legibility 
features  has  not  always  been  easy.  In  particular,  features  such  as  landmarks  and  edges  must  be 
visible  without  being  intrusive.  Creating  useful  edges  has  proved  to  be  a  particularly  difficult 
task  as  an  edge  should  ideally  follow  the  contours  of  a  district.  In  a  3-D  space,  determining  a 
sensible  size  for  edges  has  been  difficult  At  one  extreme  an  edge  might  be  a  hull  completely 
surrounding  a  district  At  the  other  it  might  be  a  thin  flat  surface  dividing  two  districts.  The 
former  is  likely  to  be  visually  intrusive;  the  latter  is  likely  to  provide  an  insufficient  sense  of 
separation  between  districts. 

Third,  other  features  and  tools  are  clearly  needed  to  help  people  navigate.  For 
example,  the  use  of  textual  information  in  the  form  of  signposts  is  an  important  part  of 
navigating  conventional  urban  environments.  We  can  imagine  adding  signposts  to  virtual 
environments  and  placing  them  at  nodes  pointing  along  paths.  Furthermore,  we  can  imagine 
that  signposts  might  refer  to  districts  and  landmarks  that  lie  along  a  path.  However,  this  gives 
rise  to  the  problem  of  how  to  name  districts  and  landmarks.  More  specifically,  given  that 
districts  and  landmarks  are  automatically  created  by  LEADS,  we  are  left  with  the  problem  of 
automatic  name  generation  or  alternatively  the  use  of  non-textual  symbolic  identifiers  on 
signposts. 

Finally,  we  need  to  be  careful  that  by  adding  additional  objects  to  an  information 
visualisation,  we  do  not  increase  the  rendering  overhead  thereby  degrading  system 
performance.  However,  we  suspect  some  extensions  to  LEADs  might  enable  it  to  actually 
improve  system  performance.  Specifically,  LEADS  might  enable  a  general  distancing  effect 
for  data  spaces  by  allowing  the  individual  objects  in  a  district  to  be  replaced  by  an  overall 
representation  of  the  district  when  viewed  from  a  distance. 

6.  Summary 

We  have  described  a  number  of  general  techniques  for  improving  the  legibility  of 
virtual  environments  so  that  their  users  might  more  easily  construct  cognitive  maps  to  help 
them  navigate.  Our  work  has  adapted  techniques  from  the  discipline  of  city  planning  where 
decades  of  experience  have  identified  key  features  of  urban  landscapes  which  are  critical  to 
their  legibility.  The  primary  goal  of  our  work  has  been  to  develop  a  set  of  algorithms  for 
automatically  creating  or  enhancing  these  features  within  information  visualisations.  These 
include: 

•  the  use  of  clustering  algorithms  to  create  districts; 

•  the  creation  of  edge  objects  separating  districts; 

•  the  placement  of  landmark  objects  at  a  central  point  between 
districts; 

•  emphasising  nearest  neighbour  node  objects  within  districts  and 
creating  paths  between  them. 

We  have  implemented  these  techniques  in  a  system  called  LEADS  which  is  intended 
to  provide  an  additional  legibility  layer  sitting  on  top  of  current  information  visualisations.  So 
far,  we  have  applied  LEADS  to  two  existing  and  contrasting  information  visualisation  tools. 
Our  paper  included  some  before  and  after  screen-shots  to  show  the  overall  effect  of  LEADS 
on  these  systems. 

Our  early  experiences  have  been  positive  and  suggest  that  this  approach  is  promising. 


However,  we  have  encountered  several  difficulties  including  the  need  to  experience  paths 
when  navigating;  the  difficulty  of  getting  an  appropriate  scale  for  features  such  as  edges  and 
landmarks;  and  problems  with  automatically  naming  features  so  that  they  may  be  referred  to 
by  signposts. 

Longer  term  work  will  include  combining  LEADS  with  larger  scale,  more  widely  used 
information  sources  (e.g.  visualising  the  World  Wide  Web)  and  carrying  out  a  more  formal 
evaluation  of  our  work.  For  the  latter,  we  intend  to  revisit  and  adapt  the  initial  experiments 
carried  out  by  Lynch  within  real  cities,  but  this  time  within  virtual  environments. 
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Figure  2:  Grapher  before  (top)  and  after  (bottom)  legibility  enhancement  using  LEADS 


Fast  algorithms 

for  drawing  Nonuniform  B-spline  surfaces: 
a  practical  application  in  Virtual  Environment 
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Abstract 

The  paper  we  intend  to  present  deals  with  the  problem  of 
drawing  parametric  surfaces  in  real-time.  A  new  fast  se¬ 
quential  arid  parallel  algorithm  for  approximate  paramet¬ 
ric  surfaces  with  polygons  is  described.  Its  application  on 
a  single  and  multi-processor  system  is  presented  by  em¬ 
phasizing  the  constraints  imposed  by  real-time  applications 
for  Virtual  Reality.  In  particular  a  practical  utilization  of 
parametric  surfaces  for  the  modeling  of  a  virtual  hand-aimn 
complex  is  described. 

1  Introduction 

The  requirements  of  real-time  performances  can  be  con¬ 
sidered  one  of  the  irremissible  aspects  to  be  taken  into 
account  when  the  control  of  a  direct  interaction  with  Vir¬ 
tual  Environments  must  be  obtained.  This  fact  assumes 
even  greater  importance  for  those  applications  tied  with 
manipulative  procedures,  in  which  the  human  operator  is 
asked  to  touch,  explore  and  grasp  virtual  objects  accord¬ 
ing  to  a  realistic  behaviour  [7]  [8]- 

The  constraint  of  achieving  a  high  degree  of  realism 
in  the  performance  of  such  operations  involves  all  affer¬ 
ent  sensory  modal  pathways  of  the  interface  systems.  In 
particular,  as  far  as  the  visual  component  is  concerned, 
the  intense  utilization  of  high  level  rendering  techniques 
for  the  graphical  representation  of  the  objects  and  human 
body  parts  belonging  to  the  Virtual  Environment  must 
be  considered.  In  this  paper  we  address  the  problem  of 
achieving  an  adequate  real-time  graphical  representation 
of  human  body  parts  by  exploiting  a  new  algorithm  beised 
on  Nonuniform  B-spline  surface  descriptions. 

As  a  theoretical  introduction  to  the  problem  is  given  in 
the  following  by  means  of  a  brief  description  of  the  defini- 
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tion  of  nonuniform  B-splines,  For  a  complete  introduction 
to  the  theorv  of  parametric  surfaces  representation  refer 
to  [1][2][3][5][4]. 

If  Tip  is  the  number  of  control  points  and  k  is  the  order 
of  nonuniform  B-splines,  the  nonperiodic  knot  vector,  for 
a  curve  or  surface  with  the  end  points  coincident  with  the 
end  control  points,  is: 

T  =  0  j  ■  *  *  )  ^np+A: )  —  ( P )  '  ^  '  '  '  i  t'np  j  1 7  ‘ 

k  k 

Remember  that  we  have  to  define  only  one  knot  vector 
with  a  curve,  while,  in  the  case  of  a  surface,  we  have  to 
define  two  knot  vectors,  one  for  each  component.  The 
blending  functions  are: 


1  if  ti  <u  <  ti+i 
0  otherwise 


(1) 


.r  /  N  ,  {ti+k  -  ■n)Ni  +  l,k-li'^) 

Ni,k(u)  =  — - - - - + - - - 

The  general  formula  for  evaluating  a  parametric  surface, 
of  n  per  m  control  points,  in  a  point  is: 


Q{u,v)  =  '^Yi^Pi,jNi,k,{u)Nj_k.{v)  (3) 

;=0  j=0 

where  k^  and  ky  are  the  order  in  u  and  v  of  the  surfaces. 
For  evaluating  the  efficiency  of  each  alghoritms,  we  de¬ 
fine: 


Costqin,  m,  k)  =  nmCosti^{k)  (4) 


1 


where 


r  i  (h\  -  /  ^  ^ 

<^5  ,Yl  ’j  —  I  ^  _j_  '2Cost^{k  —  1)  otherwise 

as  the  number  of  multiplications  for  evaluating  the  para¬ 
metric  surface  in  [uindex^  '^ index)  with  the  function  Q. 

2  The  algorithm 

If  we  fix  a  set  S  of  ns  sample  points: 

5  =  {(■Uo,Uo)o,---.(«nj-l.i^n^-l)nj-l}  (5) 

we  can  define  a  new  function: 

N Ni^j{indcx)  ~  ^i ,k.^{nindex)d^ i ,kviy index)  <  (b) 

We  can  now  define  the  matrix  NTnxmxns  Table) 
with  NTij^index  =  N Ni,j(Sindex)  \  fo^  example  the  2D  ma¬ 
trix  NTij^i  is  equal  to: 

f  NNoMSi)  NNo,i{Si)  •••  NNo,m~i{Si)  ] 

1  Ni\\oiSi)  NNiJSi)  NNi^m-i{Si)  { 

^  ^•'V„_l,o(5l)  iViV„_i.i(Si)  ■■■  iViV„_i,„_i(5i)  I 

The  function  (3)  becomes: 

Q{u,  v)  =  Qs{index) 

where: 


Max{0<i<n  |  V  0  <  j  <  m,  NTi,j,indcT  ^  0} 

and  the  matrix  CITn,xnx2  (Column  Index  Table)  with 

CITi 

ndex,i,0' 

Min{0<j<m  \  NTij^index  ^  0} 

and  CJ I'T'index,i,l' 

A/az{0  ^  j  ni  \  NTij ^index  7^  0} 

CIT  is  defined  only  for  RITindex,Q  <  i  <  RITindex,i^ 
The  function  (7)  becomes: 


(9) 


The  evalua¬ 

tion  of  the  parametric  surface  in  {uindexi  yindex)  it 
costs: 

{RIRindex,!  ~  R^Rindex,Q){0 1'Tindex,i,lO I'Tindex^i.o)  (10) 

multiplications.  The  value  of  (10)  is  a  function  of  the 
knot  vectors  (the  function  Ni^ki'^)  is  zero  outside  the 
range  [tj, t, •+)(;])  and  it  is  not  easy  to  estimate  the  value 
of  CostfQ^.  For  simplicity  we  can  use  this  definition: 


CostpQ^{njm,  k)  =  anm  where  0  <  a  <  1  (11) 

with  or  as  a  function  of  knot  vectors  of  the  Nonuniform 
B-spline.  Normally  or  is  mainly  a  function  of  the  order  of 
At  this  point  we  can  easily  precalculate  NT  and  eval-  ^1^^  parametric  surfaces.  We  can  now  evaluate  the  rela- 

uate  the  parametric  surface  "^in  {uindex.  Vindex)  with  only  ^ive  efficiency  of  our  new  algorithm  with  respect  to  classic 

nm  multiplications.  As  for  (4)  we  can  define  algorithm: 


CostQ^{n,m,  k)  =  nm  (8) 

The  function  Nij  (and  the  value  of  NTij  (index)  which 
is  function  of  N)  is  equal  to  zero  for  many  values  of  i 
and  j.  For  (1)  and  (2)  the  function  Wj,jfc(u)  (and  for 
(6),  the  function  WiYij(2ndex))  is  zero  Vu  ^  and 

Vz;  ^  [ijRjJrk]-  This  suggests  another  improvement  for 
the  algorithm.  We  can  define  the  matrix  RlTnsx2  (Row 
Index  Table)  with  RITindex,Q  equal  to: 

A-/m{0  <  i  <  n  \  V  0  <  ;  <  m,  NTij yindex  #  0} 
and  RITindex,!  equal  to: 


^  _  CosiQ(n,m,k)  _  Cost^jk)  ^  ^ 
CosiFQ^(n,m,  k)  a 

In  a  practical  case  E  is  greater  than  180-200. 

3  The  multi-processor  evolution 
of  the  algorithm 

The  algorithm  described  in  Section  2  is  sequential  but 
can  be  easily  extended  for  direct  use  on  a  parallel  ma¬ 
chine.  There  is  the  theoretical  possibility  of  implementing 
the  algorithm  on  a  massive  SIMD  parallel  computer  and 
on  a  MIMD  parallel  machine  with  local  memory,  but  we 


semaphore  ProdSem, CalcedSemjMutexSem ; 


float  ****NTable; 
int  ♦♦♦RIT, ****CIT; 

ParamSurfacePtr  PSurf ace [MAX^BUF,SI2E] ; 

PolygonGridPtr  SurfBuff er [HAX^BUF^SIZE] ; 

NormalGridPtr  Surf NormalBuff er [MAX_BUF_SIZE] ; 

short  Surf ace InBuffer, 

SurfaceCalced,NuinSubdiv[2]  ; 

where  NTable  is  a  table  of  values  defined  by  the  (6). 
Please  note  that  in  a  practical  implementation,  the  set  S, 
defined  in  (5),  is  a  regular  grid  of  values  between  (0,0) 
and  (1, 1): 


Figure  1;  A  B-Splines  based  virtual  Hand/Arm  used  for 
force  feedback  experiments  at  Scuola  Superiore  S.Anna. 


will  describe  only  a  theoretical  and  practical  method  for 
using  parametric  surfaces  on  a  symmetric  multi-processor 
system  with  shared  memory  and  with  a  small  number  of 
processors,  because  this  type  of  computer  is  the  more  dif¬ 
fuse  and  available  kind  of  parallel  machine.  On  a  sym¬ 
metric  multi-processor  system  with  shared  memory  and 
a  small  number  of  processors,  the  cost  of  communication 
and  svnchronization  suggests  to  calculate  more  paramet¬ 
ric  surfaces  in  parallel  than  to  calculate  a  singk  surface  on 
multiple  CPU.  For  writing  a  library  with  a  C-like  language 
in  order  to  calculate  nonuniform  B-spline  in  parallel,  we 
suppose  to  have  the  following  function: 

♦  a  function  name  of  semaphore=lmtSem{Iniiial  value 
of  the  semaphore); 

*  a  function  V{name  of  semaphore)  for  doing  a  signal 
on  a  semaphore; 

•  a  function  P(name  of  semaphore)  for  doing  a  wait  on 
a  semaphore; 

*  a  function  SProc(name  of  a  funciion),  a  variant  of 
the  UNIX  function  fork,  for  creating  a  new  process 
that  is  a  clone  of  the  process  that  called  SProc  and 
that  shares  the  virtual  address  space  of  the  parent 
process. 

The  shared  dates  of  the  library  are: 

^define  MAX_BUF_SIZE  40 


NTable[i][j][k][t]  -  -  d 

NumSubdy  — 

where  NumSubdu  and  NumSubd„  are  the  numbers  of 
subdivision  of  the  interval  [0  •  ■  •  1]  in  u  and  v.  In  the 
library,  NumSubdiv[0]  will  be  equal  to  NumSubd^  and 
NumSubdiv[l]  to  NumSubd^.  The  set  S  defined  in  (5) 
now  becomes: 

(0,0)  ^  ^  •••  (0,1) 

**’  (  iVum5u6d* -T  ’ 

(I’o) .  (1.1) 

RIT  and  CIT  will  be  initialized  according  to  the  defini¬ 
tions  given  in  Section  2.  The  meaning  of  ProdSem,  Cal- 
cedSem,  MuiexSem,  SurfaceInBuffer,  SurfaceCalced  will 
be  explained  below.  The  initialization  function,  that  is 
called  only  at  the  startup  of  the  program,  is: 

void  InitBSpline (short  NumCtrlPnt  [2]  , 
float  ♦Knot VectorU, short  NnmKVU, 
float  ♦Knot VectorV, short  NnmKVV, 
short  NinnSubd[2]  ,short  NniuProc) 

{ 

int  i; 

NumSubdivC0]=NumSubd[0]  ; 

NiimSubdiv[l]=NumSubdCl]  ; 

NTable=AllocAndInitNTable(NuiiiSubd, 

Knot Vect  orU , NumKVU , 

KnotVectorV  ,NujiiKVV  ,NumCtrlPnt )  ; 


RIT=AllocAndInitRIT  (NumCtrlPnt , 
NumSubd , NTablo) ; 


Figure  2:  The  virtual  Hand/ Arm  and  the  grids  of  control 
points  of  the  parametric  surfaces. 

CIT=AllocAndIiiitCIT(Nuj2iCtrlPnt , 

NuiiiSubd,Tn’able)  ; 

ProdSem^InitSemCO) ; 

CalcedSeiii=InitSem(0)  ; 

HutexSein^IiiitSeiiiCl) ; 

Surf  acelnBuf  f  er=Siirf  aceCalced=0 ; 

f  or  (i=0 ;  KNiimProc ;  i++) 

SProc(SuriProc) ; 

> 

In  this  function  we  precalculate  NTable,  CIT  and  RIT 
as  described  in  Section  2.  The  semaphore  ProdSem  is  ere- 
ated  and  initialized  to  0;  on  this  semaphore  the  parent  will 
signal  to  the  children  processes,  generated  by  the  func¬ 
tion  SProc,  when  a  new  nonuniform  B-spline  is  present  in 
the  buffer  PSurface.  The  semaphore  CalcedSem  is  created 
and  initialized  to  0;  on  this  semaphore  the  children  pro¬ 
cesses  will  signal  when  they  finished  the  calculus  of  the 
transformation  between  parametric  surface  and  the  grid 
of  polygons.  The  semaphore  MuiexSem  is  used  for  mu¬ 
tual  exclusion  by  the  children  processes.  SurfacelnBuffer 
is  the  index  of  the  first  free  ceil  in  PSurface  and  it  is  equal 
to  the  number  of  surfaces  in  the  buffer,  SurfaceCalced 
is  the  index  of  the  first  parametric  surface  in  the  buffer 
SurfacelnBuffer  to  be  calculated.  The  code  of  the  library 
is: 

void  Surf Proc (void) 

int  x,y,cii; 


f or( ; ; )  { 

P (ProdSem) ; 

P (HutexSem) ; 
cn=Surf aceCalced; 

Surf aceCalced++; 

V(MutexSem) ; 

for(x=0;x<WuiiiSubdiv[0]  ix-h-) 
f or (y=0 ; y<WumSubdiv[l] ;y++) 

Fast EvalP (PSurface [cn]  , 

X , y , Surf Buff er [cn] [x] Cy] ) ; 

CalcSurfaceNormal (Surf Buff er[cn] , 

SurfNormalBuff erCcn] ) ; 

P (CalcedSem) ; 

> 

> 

void  CalcSurf acePoint (PSurfacePtr  cp, 

PolygonCridPt  pg,NormalGridPtr  ng) 

{ 

PSur f  ac  e [Surf  ace InBuf  f er] =cp ; 

Surf Buff er [SurfacelnBuffer] =pg ; 

Surf NormalBuffer [Surf aceInBuffer] =ng ; 

Surf ac eInBuf f er++ ; 

V (ProdSem) ; 

> 

void  WaitCalcSurf ace (void) 

{ 

int  i; 

f or (i="0;  i<Surf aceInBuffer;  i++) 

P (CalcedSem) ; 

Surf aceCalced=Surf aceInBuf f er=0 ; 

> 

The  function  FasiEvalPQ  is  the  implementation  of  the 
formula  (11).  In  Surf  Proc  ()  we  call  the  function  CalcSur- 
faceNormal()  for  calculating  the  normals  to  the  surface 
SurfBufferfen].  A  typical  use  of  the  functions  in  the  li¬ 
brary  is  the  following: 

mainO 

{ 

PSurfacePtr  cp[NS] ; 

InitBSpline( . . . ) ; 
for(; ;)  { 

CalcSurf acePoint (cp [0] , . , . ) ; 

CalcSurf acePoint (cp [NS-1] ,,..); 


/*  Do  everything  you  want .  */ 

WaitCalcSurtaceO  ; 

DrawSurf aces (cp ,NS) ; 


4  Modeling  a  virtual  hand  and 
I  arm  by  parametric  surfaces 

A  set  of  40  nonuniform  B-splines  was  used  to  design  a 
I  virtual  arm  and  hand.  We  successfully  use  our  mono  and 
■  multi-processor  alghoricm  for  converting  parametric  sur¬ 
faces  in  polygons  meshes  for  a  practical  application.  We 

I  conduct  experiments  using  an  Arm  Exoskeleion  System 
(see  [6])  and  a  Hand  Exoskeleion  System  developed  at 
Scuola  Superiore  S.  Anna  [9]. 

I  The  exoskeietons  are  capable  of  recording  all  arm  and 
hand  movements  and  to  replicate  forces  on  the  arm  and 
the  hand.  By  moving  the  control  points  (see  Fig.  2  for  the 
grids  of  control  points)  of  parametric  surfaces  according 
I  to  the  data  coming  from  the  exoskeletons,  we  are  able  to 
■  draw  a  virtual  hand  and  arm  that  replicates  and  follows 
the  position  and  the  shape  (empirically  representing  the 

■  deformation  of  the  skin)  of  the  real  hand  and  arm. 

Also  we  developed  a  physically  based  simulation  of  the 
virtual  environment  for  replicating  by  means  of  the  ex- 

Ioskeletons,  on  the  real  arm  and  hand,  forces  generates  by 
virtual  objects  (see  [7]  [8]). 

We  obtained  12-15  frames  per  second  with  the  mono- 
^  processor  version  of  the  alghoritm  on  a  Silicon  Graphics 
I  Personal  Iris  4D /35TG  with  some  very  simply  objects  and 
®  a  virtual  arm/hand  drawn  with  40  nonuniform  B-splines 
approximated  by  4x4  polygons  meshes  (see  Fig.  3  for  4x4 

I  and  8x8  approximations)  for  a  total  of  640  polygons. 

Whth  a  very  complex  environment  like  that  showed  m 
Fig.  1  and  40  nonuniform  B-splines  approximated  by  4x4 

■  polygons  meshes  for  a  total  of  640  polygons  we  obtained 
16>20  frames  per  second  with  the  multi-precessor  version 
of  the  alghoritm  running  on  a  Silicon  Graphics  Powers 
440VGX  (a  multi-processor  with  four  MIPS  R3000  pro- 
I  cessors  and  a  fast  graphic  subsystem). 

®  On  the  same  machine  we  obtained  more  than  32  frames 
per  second,  by  transforming  from  parametric  representa- 

Ition  to  polygons  meshes  more  than  1300  parametric  sur¬ 
faces  per  second,  for  an  application  requiring  a  simple  vir¬ 
tual  scenario. 

I  During  the  development  of  the  research  we  make  a 
compared  test  between  our  single  processor  algorithm  de¬ 
scribed  in  (7)  and  nurbssurface()  of  SGI  Graphics  Library. 
For  the  test  of  drawing  a  virtual  hand  on  an  Indigo  R4000 
I  150MHz,  we  obtained  the  following  results: 


Figure  3:  The  parametric  surfaces  approximated  with  4x4 
polygons  (left)  and  8x8  polygons  (right). 
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The  GL  function  achieves  a  better  visual  apparency, 
probably  because  it  uses  an  adaptative  approximation  of 
the  parametric  surfaces,  but  our  preliminary  version  of  the 
algorithm  is  about  6  times  faster  than  the  GL  equivalent 
function.  With  the  final  version  of  the  algorithm  and  a 
multi-processor  system,  you  can  achieve  an  increment  of 
performances  of  more  than  one  order  of  magnitude.  This 
is  a  crucial  point  with  the  constraints  imposed  by  a  real 
time  application  as  Virtual  Reality. 

5  Conclusions 

The  article  we  have  presented  describes  a  new  approach 
to  the  design  of  virtual  hand  and  arm  in  a  Virtual  En¬ 
vironment.  A  complete  description  of  a  new  mono  and 
multi-processor  alghoritm  has  been  presented.  Experi¬ 
mental  results  on  a  high  performance  real-time  practical 
application  has  been  described. 
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Abstract 

Fourier  representations  are  suggested 
for  shape  and  texture  generation  of  in¬ 
teractive  evolution  in  virtual  environ¬ 
ments.  With  simple  input,  a  user  in  a 
virtual  environment  can  achieve  com¬ 
plexity  (e.g.  in  shape/texture  gener¬ 
ation  and  morphing)  fast,  free  from 
knowledge  of  technical  details  and  pre¬ 
processing. 

Key  words;  interactive  evolution,  shape 
generation,  texture  generation,  morphing, 
Fourier  series. 

1  Introduction 

Virtual  reality(VR)  is  a  method  of  interact¬ 
ing  with  a  computer-simulated  environment. 
VR  is  supposed  [16,  5]  to  contain  a  substan¬ 
tial  portion  of  the  following:  surround  vision, 
stero  cues,  viewer-centered  perspective,  real¬ 
time  interaction,  tactile  feedback,  etc.  To 
give  a  user  the  experience  of  not  merely  be¬ 
ing  but  acting  there,  intuitive  direct  manip¬ 
ulation  (for  example,  interactive  volumetric 
sculpting  [7]  )  is  important.  Though  the  idea 
is  simple,  making  it  work  in  practice  is  not 
trivial:  the  data  stream  from  the  sculpting 
tool  may  be  noisy  and  the  tool  has  to  be  an 

*2iid  Eurographics  Workshop  on  Virtual 
Environments,  31  January-1  Feburary  1995, 
Monte  Carlo,  Monaco. 

t  Thanks  to  T.  Poston  for  helpful  discussions. 


nus . sg 


absolute  device  instead  of  a  relative  one  for 
the  natural  mapping  of  the  physical  space  of 
the  tool  to  the  screen  space  representation  of 
it  [7].  Besides,  significant  lag  between  hand 
motion  and  the  screen  update  could  be  seri¬ 
ously  troublesome  [13,  21]. 

Even  when  this  sort  of  hardware  problems 
are  all  resolved,  the  idea  of  direct  manipu¬ 
lation  might  not  be  always  the  best  choice 
for  interaction  in  virtual  environments.  For 
complex  results  under  the  scheme  of  direct 
manupulation,  complex  input  from  the  user 
is  necessary:  say,  a  virtual  sculpture  would 
rec[uire  very  good  eye-hand  coordination,  a 
great  deal  of  effort  and  patience.  Creating 
sculptures  beyond  a  certain  level  of  complex¬ 
ity  can  be  all  but  impossible,  especially  for  a 
non-artist  [1]. 

Though  the  results  may  not  be  always  pre¬ 
dictable,  the  technique  of  interactive  evolu¬ 
tion  [6,  17]  lets  the  user  achieve  complexity 
with  a  minimum  of  user  input  and  knowledge 
of  details  which  can  augment  and  enhance 
the  power  of  direct  manipulation. 

The  next  section  briefly  explains  the  idea 
of  genetic  algorithms,  on  which  interactive 
evolution  is  based.  Applications  of  interac¬ 
tive  evolution  in  computer  graphics  are  re¬ 
viewed  in  Section  3.  Section  4  reviews  a  use 
of  it  in  the  CAVE  virtual  environment  [4], 
which  had  drawbacks  in  respect  of  elapsed 
time.  In  Section  5  and  6  we  describe  the  po- 


tential  of  the  Fourier  representation  for  fast 
generation  of  shapes  and  textures,  and  mor¬ 
phing  between  them.  Discussions  and  con¬ 
clusions  follow. 

2  Genetic  Algorithms 

Genetic  algorithms  have  been  used  to  solve 
a  wide  variety  of  problems  [8].  Used  as  an 
optimization  technique,  genetic  algorithms 
have  proven  to  be  an  effective  way  to  search 
extremely  large  or  complex  solution  spaces. 
Among  such  spaces  are  the  vast  domains  of 
shape  and  texture.  The  search  for  pleasing 
combinations  of  the  two  cannot  be  carried 
out  exhaustively,  and  aesthetic  values  can¬ 
not  easily  be  parametrized  [5].  Since  genetic 
algorithms  do  not  rely  on  problem-specific 
knowledge,  they  can  be  used  to  discover  solu¬ 
tions  that  would  be  difficult  to  find  by  other 
methods.  The  genotype  is  the  genetic  infor¬ 
mation  that  codes  for  the  creation  of  an  in¬ 
dividual.  The  phenotype  is  the  individual  it¬ 
self,  or  the  form  that  results  from  the  devel¬ 
opment  rules  and  the  genotype.  The  fitness 
function  is  used  to  evaluate  the  relative  qual¬ 
ity  of  each  phenotype,  and  the  genotypes  cor¬ 
responding  to  the  phenotype  judged  “best” 
are  used  as  the  basis  for  the  next  generation 
[8,  17,  5] 

3  Interactive  Evolution 

Interactive  evolution  provides  a  powerful 
new  technique  for  enabling  human-computer 
collaboration.  It  is  potentially  applicable 
to  a  wide  variety  of  search  problems,  pro¬ 
vided  the  candidate  solutions  can  be  pro¬ 
duced  quickly  by  a  computer  and  evaluated 
quickly  and  easily  by  a  human  [1].  Since  hu¬ 
mans  are  often  very  good  and  fast  at  process¬ 
ing  and  assessing  pictures,  interactive  evo¬ 
lution  is  particularly  well  suited  to  search 
problems  whose  candidate  solutions  can  be 
represented  visually. 

While  traditional  genetic  algorithms  usu¬ 
ally  use  an  explicit  analytic  expression  for  a 


fitness  function  to  be  evaluated  by  the  com¬ 
puter  [8],  with  interactive  evolution  the  user 
performs  this  step  based  on  visual  perception 

[17,  1]. 

The  beauty  of  interactive  evolution  is  that 
the  user  does  not  have  to  state  or  even  under¬ 
stand  an  explicit  fitness  criterion;  the  need  is 
only  be  able  to  apply  it.  This  feature  of  in¬ 
teractive  evolution  is  used  very  effectively  by 
Sims  [17]  in  creating  beautiful,  abstract  color 
images.  An  initial  population  of  images,  ei¬ 
ther  generated  randomly  by  the  computer  or 
input  by  the  user,  is  displayed  on  the  screen. 
From  the  displayed  set  the  user  selects  one 
image  for  mutation  or  two  images  for  mating: 
mutation  is  a  reproduction  technique  where 
a  chosen  parent’s  genotype  (a  Lisp  expres¬ 
sion)  is  randomly  altered,  deriving  children 
which  replace  the  current  population.  Mat¬ 
ing  or  crossover  is  the  combination  of  the 
two  genotypes.  Each  genotype  in  the  new 
generation  contains  a  part  of  both  parents’ 
genotypes.  Once  again  the  new  population 
(with  ^too  long  to  render’  types  weeded  out) 
replaces  the  old  population.  The  mating 
and/or  mutation  operations  are  applied  to 
the  selected  images  to  produce  a  new  set  of 
progeny  images  that  supply  the  input  for  the 
next  round  of  user  selection.  This  process  is 
repeated  multiple  times  to  evolve  an  image 
of  interest  to  the  user.  Evolved  images  may 
be  saved  and  later  recalled  for  mating  with 
other  evolved  images. 

There  are  other  noticeable  applications  of 
interactive  evolution  [1,2,  20]  since  the  in¬ 
spiring  work  of  Richard  Dawkins  [6]. 

4  Related  Work  in  a  Vir¬ 
tual  Environment 

The  technique  of  interactive  evolution  was 
also  employed  in  a  virtual  environment, 
CAVE  [4],  to  generate  shape  and  sound  [5]. 
Besides  being  manipulated  (e.g.  rotated  and 
moved),  objects  can  be  mutated  or  mated 
to  another  object  to  generate  new  objects  of 


different  shapes. 

In  this  scheme  the  objects  are  isosurfaces 
{E{x,y,z)  =  T},  where  the  genotype  spec¬ 
ifies  the  mathematical  function  E,  which 
can  become  quite  complex.  Since  the  sur¬ 
face  extraction  is  done  by  the  CPU-intensive 
Marching  Cubes  Algorithm  (with  evaluation 
of  E  at  all  cubic  grid  points),  it  is  slow;  and 
the  large  number  of  triangles  typical  of  a 
Marching  Cubes  surface  gives  long  rendering 
times. 

This  could  be  serious  drawbacks  in  an  ap¬ 
plication  of  interactive  evolution  because  the 
human’s  speed  and  patience  are  limiting  fac¬ 
tors  [1].  In  the  next  section,  we  suggest 
Fourier  representation  which  will  allow  fast 
surface  generation,  with  no  restriction  on  the 
class  of  surfaces  representable. 

5  Fourier  Surfaces 

Fourier  representations  express  a  function  in 
terms  of  an  orthonormal  basis.  One  of  the 
motivations  for  a  basis  representation  is  that 
it  allows  us  to  express  any  obejct  as  weighted 
sum  of  a  set  of  known  functions. 

A  surface  in  3D  can  be  described  explicitly 
by  three  functions  of  two  surface  parameters: 

X  (u,  u)  =  (x  (u,  v)  ,y{u,v),z  (u,  v)) 

where  u  and  v  vary  over  the  surfaces  and 
I,  y  and  z  are  the  associated  Cartesian  co¬ 
ordinates.  This  representation  imposes  no 
restriction  on  the  class  of  surfaces  repre¬ 
sentable.  In  order  to  represent  surfaces,  a 
basis  for  functions  of  two  variables  is  needed. 
The  following  can  be  used  [3,  19]: 

cc;,m  =  cos(2x/u)  cos(27rmu) 
sc;,m  =  sin(27r/u)  cos(27rmu) 
csi^m  =  cos(2irIu)  sin(2xmt;) 

•53;, m  =  sin(2x/u)  sin(2xmu) 

where  ?,  m  =  0, 1, 2,  •  •  • 


A  function  is  then  represented  by: 

K  K 

f  -  (  (ll,mCCl^m  +  bl,mSCl^rn. 

l=lQ  771=0 

I  fTTl  ^1,771^^1 ,7TL  ) 


where 

f  1  :  I  =  Q,  m  =  0 

A(,m  =  1^  :  /  =  0,  m>0  or  />0,  m  =  0 

(4  :  /  >  0,  m  >  0 

truncating  the  series  at  K.  There  are  three 
sets  of  parameters  corresponding  to  the  three 
coordinate  functions, 

Uj,,  6x,  Cx,  dx,  hy^  Cy,  dy ,  ^^^2?  ^2 j  j  d^ .  a 

genotype  consists  of  these  parameters.  For 
smooth  surfaces,  a  few  dozen  of  low  fre¬ 
quency  components  will  be  enough  since 
high  frequency  components,  roughly  speak¬ 
ing,  correspond  to  wiggles  in  surfaces. 

These  surfaces  are  expressed  by  paramet¬ 
ric  equations,  so  no  process  of  polygonalizing 
isosurfaces  of  an  implicit  function  is  neces¬ 
sary.  The  parametric  equations  are  defined 
on  a  2-D  region  while  the  implicit  function 
is  on  a  3-D  region:  the  function  evaluation 
has  complexity  of  0{n'^)  for  the  one,  while 
O(n^)  for  the  other.  Precomputing  of  the 
basis  functions  at  mesh  points  is  also  pos¬ 
sible,  replacing  trig  function  evaluation  by 
lookup  and  simple  arithmetic:  it  takes  5  sec¬ 
onds  to  generate  each  shape  in  Fig  1,  Fig  2 
and  Fig3  using  a  machine  of  SiliconGraph- 
ics  IRIS  INDIGO  (32  x  32  mesh  points  are 
used  and  the  Fourier  series  are  truncated  at 
K  =  S.)  It  should  be  noted  that  the  elapsed 
time  is  independent  of  complexity  of  the  ge¬ 
ometry  generated. 

6  Fourier  Textures 

Fourier  representation  could  also  be  used  for 
synthesizing  textures.  Instead  of  a  Carte¬ 
sian  coordinate,  T{u^v)  is  defined  as  two- 
dimensional  greyscale  texture,  where  u  and 


V  are  coordinates  in  texture  space.  If  a 
color  texture  is  to  be  defined,  three  func¬ 
tions,  R{u,  u),  G{u,  v)  and  5(u,  u)  have  to  be 
maintained  for  the  color  components.  Any 
arbitrary  texure  can  be  described  with  this 
Fourier  representation:  the  Fourier  basis  is 
complete. 

As  in  the  interactive  evolution  for  the 
shape  generation,  Fourier  coefficients  be¬ 
come  genotypes,  again.  It  should  be  noted 
a  contrast  between  this  and  that  of  Sims 
[17]:  the  motivation  of  using  Lisp  expressions 
in  genotype  and  evolving  complex  functions 
is  to  surpass  limitation  on  the  set  of  possi¬ 
ble  phenotypes  (z.e.,  textures)  if  genotypes 
consist  of  a  fixed  number  of  parameters  and 
fixed  expression  rules  [17].  But  there  is  no 
limitation  with  Fourier  representation;  any 
texture  represented  with  a  complex  function 
consisting  of  Lisp  expressions  can  also  be  ar¬ 
bitrarily  well  approximated  by  a  Fourier  de¬ 
scription. 

Sims  uses  a  parallel  supercomputer  to  ren¬ 
der  the  images  at  interactive  speed  [17]  be¬ 
cause  a  genotype  for  an  image  can  be  a  func¬ 
tion  which  has  evolved  to  be  increasingly 
complex  (Le.,  a  sequence  of  image-processing 
functions).  In  general,  therefore,  as  images 
evolve,  more  time  has  to  be  spent  to  render 
the  images  of  increasingly  complex  functions. 
However,  in  the  case  of  the  Fourier  textures, 
constant  time  rendering  is  possible  if  K  is 
fixed. 

7  Fourier  Morphing 

Morphing  techiniques  for  transforming  im¬ 
ages  have  demonstrated  remarkable  results 
and  have  achieved  widespread  use  [11].  In 
this  work,  however,  we  rely  on  interpolat¬ 
ing  the  Fourier  coefficients  of  the  shape 
pairs  [9]  rather  than  establishing  a  mapping 
between  the  vertices  and  edges  of  the  re¬ 
spective  objetcs  and  introducing  new  ver¬ 
tices  and  edges  for  making  a  one-to-one  cor¬ 
respondence/connectivity  relationships  be¬ 
tween  vertices  [15].  Besides  no  interven¬ 


tion  of  users  nor  any  preprocessing,  the  mor¬ 
phing  itself  can  be  done  fast:  rendering  a 
Fourier  surface  can  be  done  fast  as  decribed 
in  the  previous  section  and  getting  interme¬ 
diate  shapes  only  requires  interpolating  the 
few  dozen  of  the  Fourier  coefficients. 

The  task  is  easy  even  when  changing 
smoothly  between  widely  varying  topologies. 
It  should  be  noted  a  constrast,  especially  in 
terms  of  elapsed  time  perspective,  between 
this  work  and  Fourier  Volume  Morphing  [9], 
the  latter  of  which  has  to  spend  time  of 
0{n^  log  n)  in  the  Fourier  transforms  besides 
the  time  of  0{n^)  for  the  operations  involved 
in  blending  datasets  or  rendering  an  image 
from  a  dataset  of  size  n  x  n  x  n:  the  Fourier 
coefficients,  in  our  case,  are  already  available 
at  the  step  of  the  shape  synthesis  so  that 
fast  morphing  is  possible.  Similarly,  morph¬ 
ing  between  textures  could  be  done  fast  by 
interpolating  the  Fourier  coefficients  of  the 
textures. 

8  Conclusions  and  Fur¬ 
ther  Work 

We  suggested  Fourier  representations  for 
surface/texture  generation  and  morphing  be¬ 
tween  them.  Due  to  the  parametric/explicit 
expressions  and  precomputation  of  basis 
functions,  these  can  be  done  very  fast  com¬ 
pared  to  the  previous  work  where  super¬ 
computers  are  being  used  or  explored  to  be 
used.  We  are  exploring  building  a  Virtual 
Art  Gallery  where  users  can  not  only  watch 
art  exhitted,  but  also  make  their  own  of  com¬ 
plexity  without  complications  from  interac¬ 
tion/interface  problems. 
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Abstract 

This  paper  presents  a  new  method  for  solving  the  following  problem:  Given  a 
polygonal  model  of  some  geometric  object  generate  several  more  and  more  approxima¬ 
tive  representations  of  this  object  containing  less  and  less  polygons.  The  idea  behind  the 
method  is  that  small  detail  in  the  model  is  represented  by  many  spatially  close  points.  A 
hierarchical  clustering  algorithm  is  used  to  generate  a  hierarchy  of  clusters  from  the  ver¬ 
tices  of  the  object’s  polygons.  The  coarser  the  approximation  the  more  points  are  found 
to  lie  within  one  cluster  of  points.  Each  cluster  is  replaced  by  one  representative  point 
and  polygons  are  reconstructed  from  these  points.  A  static  detail  elision  algorithm  was 
implemented  to  prove  the  practicability  of  the  method.  This  paper  shows  examples  of 
approximations  generated  from  different  geometry  models,  pictures  of  scenes  rendered 
by  a  detail  elision  algorithm  and  timings  of  the  method  at  work. 

1  Introduction 

In  several  papers  on  recent  developments  in  computer  graphics  the  presented  algorithms  rely  on  the 
availability  of  several  approximations  of  decreasing  complexity  to  the  polygonal  representation  of 
one  geometric  object  (referred  to  as  levels  of  detail  -  LODs  -  throughout  this  paper).  Primarily  for 
performance  reasons  these  algorithms  choose  one  of  the  approximative  representations  of  the  objects 
in  the  course  of  their  work  thereby  trading  quality  for  speed.  Examples  are  the  visualization  of  com¬ 
plex  virtual  environments  [Funk93],  3D  graphics  toolkits  [Rolf94]  or  indirect  illumination  calcula¬ 
tions  [Rush93].  In  visualization  of  virmal  environments  LODs  allow  to  retain  a  constant  frame  rate 
during  the  navigation  through  the  environment  by  adapting  the  detail  of  the  visible  objects  to  the 
complexity  of  the  visible  part  of  the  scene  and  the  graphics  performance  of  the  used  hardware.  Indi¬ 
rect  illumination  calculations  attain  performance  gains  through  substituting  different  LODs  while 
calculating  the  energy  exchange  between  two  objects. 

The  authors  of  these  papers  do  not  mention  how  to  automatically  obtain  the  different  LODs.  Either 
they  use  very  coarse  approximations  such  as  bounding  volumes  or  they  model  each  LOD  by  hand 
thereby  multiplying  the  effort  for  generating  the  geometry  database. 
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In  the  field  of  surface  reconstruction  from  laser  range  device  data  algorithms  have  been  developed 
which  allow  to  decimate  the  number  of  triangles  used  to  represent  the  original  surface.  However, 
these  algorithms  are  very  costly  as  far  as  computational  complexity  is  concerned  and  are  less  suitable 
for  geometry  models  of  CAD  software  [Hopp94].  Other  methods  filter  large  triangle  or  polygon 
meshes  with  the  aim  to  retain  the  detail  of  the  digitizing  process  but  do  not  generate  several  LODs 
(see  e.g.  [DeHa91],[Turk92],[Schr92]).  The  presented  method  fills  the  gap  between  the  computation¬ 
ally  and  algorithmically  complex  surface  reconstmction  from  laser  range  data  by  a  user  selectable 
number  of  triangles  and  the  generation  of  models  of  varying  complexity  by  hand.  It  automatically 
generates  different  LODs  from  CAD  geometry  models.  The  original  model  is  referred  to  as  the 
level  0.  Coarser  LODs  are  referred  to  as  level  1,  level  2  and  so  on. 

The  goal  in  designing  the  method  was  execution  speed  while  keeping  a  close  resemblance  to  the 
original  model.  In  coarser  LODs  the  number  of  polygons  must  decrease  significantly.  Small  details  of 
the  model  can  be  left  out  but  the  overall  structure  of  the  object  should  stay  the  same  with  as  little 
polygons  as  possible.  It  is  important  to  keep  the  silhouette  of  the  object  to  minimize  annoying  arti¬ 
facts  during  blending  between  different  LODs  and  to  keep  the  error  low  which  is  introduced  into  the 
algorithm  making  use  of  the  LODs. 

The  method  uses  a  hierarchical  clustering  algorithm  to  perform  the  task  of  removing  the  detail  from 
the  models  and  generating  coarser  approximations  to  the  models.  In  the  first  stage  the  clustering  algo¬ 
rithm  generates  a  hierarchy  of  clusters  from  the  points  in  the  model.  Different  algorithms  exist  to 
generate  such  a  hierarchy  -  the  “centroid”  or  “unweighted  group  pair”  method  [Snea73]  is  used  in  the 
current  implementation.  Depending  on  the  desired  degree  of  approximation  the  method  traverses  the 
calculated  cluster  tree  to  a  specific  depth  and  either  keeps  the  geometry  model  or  replaces  parts  by 
coarser  approximations. 

The  current  implementation  of  the  proposed  method  is  suitable  to  calculate  LODs  from  polygonal 
objects  containing  several  1000  data  points.  In  an  interactive  viewer  the  generated  LODs  are  used  to 
render  environments  of  geometric  objects  at  interactive  frame  rates.  The  viewer  chooses  a  LOD  for 
each  visible  object  depending  on  the  size  of  its  on  screen  projection  without  introducing  highly 
noticeable  artifacts  into  the  rendered  views.  The  program  allows  navigation  through  the  environment 
as  well  as  a  close  inspection  of  the  objects’  LODs  generated  by  the  method. 

2  Previous  work 

Hoppe  et  al.  [Hopp94]  have  presented  a  method  to  solve  the  following  problem:  Starting  from  a  set 
of  three  dimensional  data  points  and  an  initial  triangular  mesh  they  produce  a  mesh  of  the  same 
topology  which  fits  the  data  well  and  has  a  smaller  number  of  vertices.  They  achieve  this  by  minimiz¬ 
ing  an  energy  function  which  explicitly  models  the  competing  desires  of  conciseness  of  representa¬ 
tion  and  fidelity  to  the  data  by  using  three  basic  mesh  transformations  to  vary  the  structure  of  the 
triangle  mesh  in  an  outer  optimization  loop.  For  each  triangle  mesh  found  they  optimize  the  energy 
function  by  slight  changes  to  the  vertex  positions  of  the  triangle  mesh.  The  method  is  fairly  complex 
and  demanding  as  far  as  computational  and  implementational  efforts  are  concerned. 

DeHaemer  et  al.  PeHa91]  have  presented  two  methods  for  approximating  or  simplifying  meshes  of 
quadrilaterals  topologically  equivalent  to  regular  grids.  The  first  method  applies  adaptive  subdivision 
to  trial  polygons  which  are  recursively  divided  into  smaller  polygons  until  some  fitting  criterion  is 
met.  The  second  method  starts  from  one  of  the  polygons  in  the  mesh  and  tries  to  grow  it  by  combin¬ 
ing  it  with  one  of  its  neighbours  into  one  bigger  polygon.  Growing  the  polygon  stops  when  the  fitting 
criterion  is  violated.  These  methods  work  well  for  large  regular  meshes  and  achieves  reductions  in 
polygon  numbers  down  to  10%  for  sufficiently  smooth  surfaces. 
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Turk’s  re-tiling  method  [Turk92]  is  best  suited  for  polygonal  meshes  which  represent  curved  sur¬ 
faces.  It  generates  an  immediate  model  containing  both  the  vertices  from  the  original  model  and  new 
points  which  are  to  become  the  vertices  of  the  re-tiled  surface.  The  new  model  is  created  by  remov¬ 
ing  each  original  vertex  and  locaUy  re-triangulating  the  surface  in  a  way  which  matches  the  local 
connectivity  of  the  initial  surface.  It  is  worth  mentioning  that  models  containing  nested  levels  of  ver¬ 
tex  densities  can  be  generated  and  that  smooth  interpolation  between  these  levels  is  possible. 

Schroeder  et  al.  [Schr92]  deal  with  decimation  of  triangle  meshes  in  the  following  way:  In  multiple 
passes  over  an  existing  triangle  mesh  local  geometry  and  topology  is  used  to  remove  vertices  wWch 
pass  a  distance  or  angle  criterion.  The  holes  left  by  the  vertex  removal  are  patched  using  a  local  trian¬ 
gulation  process.  Using  their  approach  Schroeder  et  al.  successfully  decimate  triangle  meshes  gener¬ 
ated  with  the  marching  cube  algorithm  down  to  10%. 

All  these  approaches  have  in  common  that  they  start  out  from  polygon  meshes  which  contain  a  lot  of 
redundancy:  they  exploit  the  fact  that  in  most  areas  of  the  surfaces  curvature  is  low  and  vertices  or 
triangles  can  be  left  out  or  combined  without  losing  features  of  the  surface.  Vertices  in  such  areas 
fullfil  the  preconditions  under  which  simplifications  to  the  mesh  are  made.  However,  such  precondi¬ 
tions  for  simplification  are  rarely  met  in  human  modelled  CAD  objects  where  flat  surfaces  are  con¬ 
structed  from  large  polygons.  They  cannot  be  further  simplified  with  these  strategies.  Moreover  the 
approaches  are  quite  complex  to  implement  and  computationally  intensive  as  they  all  involve  re-tri- 
angulation  and  multiple  passes  over  the  polygon  mesh. 

It  has  been  pointed  out  by  a  reviewer  that  the  proposed  method  is  similar  to  the  method  introduced  by 
Rossignac  et  al.  [Ross92].  Rossignac  uses  a  regular  grid  to  find  points  in  close  proximity  which  is 
disadvantageous  in  the  case  of  greatly  differing  object  sizes.  The  method  proposed  in  this  paper  uses 
a  hierarchical  object-adaptive  clustering  scheme  instead  which  can  deal  with  such  differences  and 
does  not  introduce  any  grid  size  into  the  generated  models.  The  method  does  not  require  the  polygon 
mesh  to  contain  a  lot  of  redundancy  nor  does  it  require  the  mesh  to  be  of  any  predetermined  type  or 
topology.  In  fact  it  is  well  suited  to  continue  from  where  the  above  algorithms  finish.  It  works  on 
polygonal  objects  of  several  1000  vertices  as  in  hand  modelled  CAD  objects  and  explicitly  generates 
several  LCDs  from  these  objects  in  one  pass  over  the  object’s  geometry  while  retaining  the  overall 
shape  and  size  of  the  objects.  It  is  designed  to  be  fast  and  efficient  on  such  models  and  is  well  suited 
to  generate  the  LODs  while  the  object  database  is  loaded  into  memory. 

3  Overview  of  the  Approach 

This  approach  implements  a  reasonable  fast  method  to  generate  several  LODs  from  polygonal  object 
models.  This  papers  terminology  refers  to  the  original  model  as  level  0.  The  algoritiim  was  not 
required  to  exactly  keep  the  topology  of  the  geometry  as  did  the  algorithms  mentioned  in  the  former 
section.  Nevertheless  the  generated  LODs  resemble  the  original  model  closely  though  with  less  and 
less  polygons.  The  coarsest  LOD  should  not  contain  more  than  a  few  dozen  polygons  for  an  original 
object  consisting  of  several  1000  polygons. 

The  algorithm  can  be  described  as  follows: 

•  Apply  a  hierarchical  clustering  algorithm  to  the  vertices  of  the  object  model  to  produce  a  tree 
of  clusters. 

•  For  each  LOD  generate  a  new  (less  complex)  object  model  using  cluster  representatives  mstead 
of  original  polygon  vertices. 

•  Remove  multiply  occurring,  redundant  primitives  from  each  LOD. 
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In  the  second  stage  a  layer  in  the  cluster  tree  is  determined  which  describes  the  way  the  LOD  approx¬ 
imates  the  original  model.  Using  the  clusters  of  this  layer,  each  polygon  has  its  vertices  replaced  by 
the  representative  of  the  cluster  it  belongs  to.  This  may  leave  the  number  of  vertices  unchanged  (if  all 
points  fall  into  different  clusters),  the  number  of  vertices  may  be  reduced,  the  polygon  may  coUapse 
into  a  linestroke  or  its  new  representation  may  be  a  single  point.  In  this  way  formerly  unconnected 
parts  of  the  model  may  eventually  become  connected  when  separated  points  of  different  polygons 
fall  into  one  cluster  and  are,  therefore,  mapped  onto  one  cluster  representative.  However,  as  the  clus¬ 
tering  algorithm  only  clusters  spatially  close  points  the  overall  appearance  of  the  object  remains  the 
same  depending  on  the  degree  of  approximation  desired  in  the  current  LOD.  In  particular  polygons 
become  bigger  if  their  points  are  moved  apart  through  clustering.  Therefore,  the  surface  of  the  object 
will  never  be  tom  apart. 

4  Hierarchical  Clustering 

Clustering  algorithms  work  on  a  set  of  objects  with  associated  description  vectors,  i.e.  a  set  of  points 
in  multidimensional  space.  A  dissimilarity  measure  is  defined  on  these  which  is  a  positive  semidefi- 
nite  symmetric  mapping  of  pairs  of  points  and/or  clusters  of  points  onto  the  real  numbers  (i.e.  d(ij)  > 
0  and  d(ij)  =  d(j,i)  for  points  or  clusters  i  and  j).  Often  (as  in  this  case)  the  stronger  distance  is  used, 
where  in  addition  the  triangular  inequality  is  satisfied  (i.e.  d(ij)  <  d(i,k)  +  d(k,j))  as  in  the  Euclidean 
distance.  The  general  algorithm  for  hierarchical  clustering  may  be  described  as  follows  (although 
very  different  algorithms  exist  for  the  same  hierarchical  clustering  method  [Murt83]): 

Step  1  Examine  all  inter-point  dissimilarities,  and  form  a  cluster  from  the  two  closest  points. 
Step  2  Replace  the  two  points  clustered  by  a  representative  point 

Step  3  Return  to  Step  1  treating  constructed  clusters  the  same  as  remaining  points  until  only  one 
cluster  remains. 

Prior  to  Step  1  each  point  forms  a  cluster  of  its  own  and  serves  as  the  representative  for  itself.  The 
aim  of  the  algorithm  is  to  build  a  hierarchical  tree  of  clusters  where  the  initial  points  form  the  leaves 
of  the  tree.  The  root  of  the  tree  is  the  cluster  containing  all  points. 

Whenever  two  points  (or  clusters)  are  joined  into  a  new  cluster  in  Step  2,  a  new  node  is  created  in  the 
cluster  tree  having  the  two  clusters  i  and  j  as  the  two  subtrees.  For  the  new  cluster  a  representative  is 
chosen  and  a  dissimilarity  measure  is  calculated  describing  the  dissimilarity  of  the  points  in  the  clus¬ 
ter.  The  representative  g  of  the  new  cluster  is  calculated  from 

\i\  gi  +  ill  gj 

g  =  - [ipr[/j - ■  ~  number  of  points  in  cluster  i. 

The  dissimilarity  measure  d  equals  the  distance  of  the  two  joined  clusters  i  and;: 

d  =  d{i,j)  =  • 

The  run-time  complexity  of  this  approach  can  be  analyzed  as  follows:  Step  1  is  the  most  complex  in 
the  above  algorithm  iO{N^)  in  a  naive  approach).  The  time  required  to  search  the  closest  pair  of 
points  can  be  decreased  by  storing  the  nearest  neighbours  for  each  point  or  cluster.  Therefore,  a  BSP- 
tree  is  built  from  the  points  at  the  cost  of  0(NlogN)  to  facilitate  the  initialization  of  the  nearest 
neighbour  pairs  (at  the  equal  cost  of  <9(A''logA0 ). 

Each  time  Step  2  is  performed  the  nearest  neighbour  data  is  updated  at  the  cost  of  0(N)  on  the  aver- 
age  yielding  a  total  complexity  of  the  algorithm  of  As  stated  in  the  paper  of  Murtagh 

[Murt83]  hierarchical  clustering  can  be  implemented  with  a  complexity  of  less  than  0{N),  so  a 
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future  implementation  should  incorporate  such  optimizations  or  one  of  those  clustering  algorithms  to 
make  the  method  suitable  for  dealing  with  more  complex  geometry  models. 

5  Finding  the  Approximative  Polygonal  Model 

The  tree  of  clusters  generated  by  the  hierarchical  clustering  algorithm  is  used  as  the  input  to  the  next 
stage  of  the  method  -  the  automatic  generation  of  several  LODs.  Starting  from  the  highest  LOD 
(coarse  approximation)  the  algorithm  proceeds  towards  level  0  (the  original  model).  For  each  level  to 
generate  a  size  of  object  detail  is  determined  which  can  be  ignored  in  this  level.  The  found  size  will 
be  used  to  choose  nodes  in  the  cluster  tree  which  have  dissimilarity  measures  (i.e.  cluster  distances) 
of  similar  magnitude.  l/8th  of  the  room  diagonal  of  the  object’s  bounding  box  works  weU  in  most 
cases  for  the  coarsest  approximation. 

Next  a  descend  into  the  cluster  tree  is  made  starting  from  the  root  until  a  node  is  found  the  dissimilar¬ 
ity  measure  of  which  is  smaller  than  the  neglectable  detail  size  of  the  current  LOD.  For  each  such 
cluster  a  representative  is  calculated  and  all  the  points  in  the  cluster  are  replaced  by  this  representa¬ 
tive.  The  point  which  is  furthest  away  from  the  object’s  centre  is  taken  as  the  cluster  representative. 
Although  other  methods  are  possible  (see  “Results  and  Timings”  on  page  7)  this  one  delivered  the 
most  satisfying  results.  It  keeps  the  overall  shape  and  size  of  the  objects  and  has  the  additional  advan¬ 
tage  that  no  new  vertices  need  to  be  introduced  into  the  object  s  model.  Using  the  cluster  average 
might  result  in  better  looking  shapes  of  the  LOD  but  tends  to  produce  LODs  which  are  increasingly 
smaller  compared  to  the  original  object  and  results  in  annoying  effects  during  rendering. 

From  the  found  layer  of  clusters  in  the  cluster  tree  a  new  geometric  object  with  the  same  amount  of 
(possibly  degenerate)  polygons  is  calculated,  only  the  point  coordinates  of  some  of  the  polygons  ver¬ 
tices  change.  Now  each  polygon  is  examined  to  find  out  whether  it  is  still  a  valid  polygon  or  if  it  has 
collapsed  into  a  line  stroke  or  a  point.  Moreover,  polygons  with  more  than  3  vertices  might  need  to  be 
triangulated  as  their  vertices  are  not  plane  in  general.  Polygon  normals  are  calculated  anew  for  each 
LOD.  In  models  of  tessellated  curved  surfaces  the  normals  may  be  interpolated  at  the  vertices  if 
desired. 


6  Cleaning  up  the  New  Model 

As  polygons  may  collapse  into  line  strokes  and  points,  care  must  be  taken  not  to  incorporate  unneces¬ 
sary  primitives  into  the  LODs.  For  example  lines  or  points  which  appear  as  an  edge  or  vertex  of  a 
polygon  in  the  current  LOD  can  be  discarded.  Even  certain  polygons  can  be  left  out. 

In  a  second  pass  over  the  new  approximated  geometry  those  primitives  in  the  LODs  are  identified 
which  can  be  removed  without  changing  the  look  of  the  LOD.  All  points  which  occur  as  a  vertex  of  a 
valid  polygon  or  of  a  valid  line  are  flagged  and  all  lines  which  occur  as  an  edge  of  a  valid  polygon, 
too.  Assuming  that  the  models  are  composed  of  closed  polyhedra  (as  is  usually  the  case)  -  aU  pairs  of 
polygons  which  contain  the  same  vertices  in  reversed  sequence  and  all  polygons  which  contain  the 
vertices  of  another  valid  polygon  in  the  same  orientation  can  be  flagged  as  well. 

For  the  sake  of  speed  these  flagged  primitives  can  be  removed  from  the  LOD  geometry.  However,  if 
it  is  required  to  retain  the  relationship  between  the  primitives  in  the  LODs  and  the  original  geometry 
model,  the  flagged  primitives  can  be  retained  and  need  not  be  processed. 

The  next  (finer)  LOD  is  generated  in  the  same  way,  only  the  size  of  neglectable  object  details  is 
decreased  by  a  certain  factor  (in  the  current  implementation  the  factor  is  2,  which  proved  to  work 
well).  Therefore,  a  lower  layer  is  chosen  in  the  cluster  tree  where  clusters  are  smaller  and  represent 
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smaller  features.  As  a  result  less  detail  of  the  original  model  is  left  out  because  less  points  fall  into 
one  cluster. 

Different  LODs  generated  by  the  method  are  shown  in  Fig.  1 .  Top  left  shows  the  original  model,  the 
other  three  groups  of  pictures  show  higher  LODs  of  the  object  in  a  close-up  (big  pictures)  and  magni¬ 
fied  versions  of  both  the  original  model  (small  left)  and  the  LOD  (small  right)  from  a  distance  for 
which  this  LOD  is  appropriate. 

7  A  Viewing  System  Based  on  Levels  of  Detail 

A  viewing  system  using  the  LODs  was  implemented  both  to  visually  control  the  results  of  the 
method  and  to  demonstrate  the  usability  of  the  LODs  in  rendering  of  complex  virtual  environments. 
Generally  speaking  the  incorporation  of  LODs  into  the  viewer  resulted  in  sufficient  speedup  for  near 
interactive  frame  rates  (5-10  Hz)  on  a  graphics  system  with  a  peak  performance  of  25K  polygons  per 
second. 

Prior  to  drawing  the  scene  the  system  determines  the  visible  objects  by  classifying  their  bounding 
boxes  to  the  viewing  frustum.  For  the  potentially  visible  objects  the  length  of  the  projection  of  the 
bounding  box  diagonal  is  calculated  to  give  a  measure  at  which  LOD  the  object  should  be  rendered. 
Then  all  the  polygons  facing  the  viewer  are  rendered  using  the  Z-buffer  algorithm. 


Fig.  2  Living  Room  left  with,  right  without  LODs  (left  6810,  right  13847  polygons) 


8  Results  and  Timings 

The  proposed  method  for  the  automatic  generation  of  LODs  works  well  for  polygonal  object  models 
of  some  1000  vertices.  Models  can  contain  any  kind  of  polygons  (convex,  concave,  general)  as  long 
as  the  triangulation  algorithm  for  approximated  polygons  can  handle  them.  However,  convex  poly¬ 
gons  are  preferable  for  high  rendering  speed.  Lines  and  points  are  legal  primitives  as  well.  The 
objects’  meshes  should  be  closed  although  a  violation  of  this  requirement  does  not  prevent  the  algo¬ 
rithm  from  being  used  on  the  model. 
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Original 

Model 


Level  n 


Fig.  1  Levels  of  Detail  of  Plant  (6064,  3674, 1225,  339  polygons/lines) 


It  is  possible  to  vary  the  method  in  several  ways  to  better  match  the  requirements  of  the  application 
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for  which  the  LCDs  are  generated.  First  the  number  of  LODs  generated  for  each  object  can  be 
adapted  by  changing  the  factor  by  which  the  dissimilarity  measure  is  updated  from  level  to  level. 
Second  the  selection  of  the  representative  for  each  cluster  can  be  chosen  from  a  variety  of  possibili¬ 
ties  (cluster  average,  point  in  cluster  closest  to  cluster  average,  point  in  cluster  furthest  from  object 
centre,  random  point  in  cluster...).  Third  the  primitives  generated  by  the  method  can  be  made  to 
include  points,  lines  or  polygons  or  any  combination.  Founh  the  degenerated  polygons  can  be  left 
within  the  LODs  to  keep  the  relationship  between  the  primitives  in  the  original  model  and  the  LODs. 
This  is  useful  for  transferring  information  from  the  primitives  of  one  LOD  to  the  other  (e.g.  color, 
texture, ...). 

The  method  was  applied  to  a  wide  variety  of  objects  achieving  acceptable  results  as  far  as  speed  and 
quality  of  the  approximation  are  concerned.  The  resulting  LODs  are  not  easily  distinguished  from  the 
original  models  when  viewed  from  increasing  distances  but  contain  far  less  polygons  than  the  origi¬ 
nal  models. 

Fig.  3  shows  some  more  examples  of  objects  and  their  LODs  from  a  close  as  well  as  from  a  suitable 
viewpoint,  i.e.  an  appropriate  distance  for  the  use  of  the  respective  LOD.  The  statistics  of  the  algo¬ 
rithm  are  summarized  in  Table  1  for  the  objects  in  Fig.  1  and  Fig.  3.  They  were  obtained  on  a  Mips 
R3000  CPU. 


Level  0 
Polys/Points 

Level  1 
(polygons) 

Level  2 
(polygons) 

Level  3 
(polygons) 

Level  4 
(polygons) 

Time  (sec) 
Total  for  4  LOD 

Lamp 

790/3084 

492(62%) 

210(27%) 

63(8%) 

37  (5%) 

2.9 

Fitting 

1581/5301 

1325(43%) 

833  (27%) 

225(7%) 

55  (2%) 

4.2 

Shelves 

1783/6628 

740(42%) 

472(26%) 

454(25%) 

231  (13%) 

3.0 

Plant 

6064/25926 

3674(61%) 

1225  (20%) 

339(6%) 

57  (1%) 

57.0 

Table  1  Statistics  for  the  new  method 


9  Future  Work 

As  already  mentioned  briefly  we  are  investigating  at  two  areas  of  further  research.  First  we  want  to 
further  speed  up  the  hierarchical  clustering  algorithm  by  making  best  use  of  the  optimizations 
described  by  Murtagh  [Murt83].  This  work  will  decrease  the  order  of  complexity  of  the  clustering 
algorithm  below  0{r?)  allowing  to  deal  with  more  complex  object  models  in  less  time. 

Second  we  want  to  build  a  new  mode  into  our  viewer  which  guarantees  a  constant  frame  rate  through 
the  method  described  by  Funkhouser  et  al.  [Funk93].  We  hope  to  outdo  their  results  in  variations  in 
the  frame  rate  as  our  object  models  are  well  structured  and  probably  allow  faster  and  better  estima¬ 
tion  of  the  rendering  complexity  of  the  visible  objects. 

Further  research  is  needed  to  increase  the  flexibility  of  our  method  and  to  further  improve  the  overall 
look  of  the  LODs.  Moreover,  we  are  investigating  the  applicability  of  LODs  to  other  application 
areas. 
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Abstract 

FACS(Facial  Action  Coding  System)  was  proposed  by  the  psychologists,  Paul  Ekman  and  Wallace  V. 
Friesen  to  describe  a  facial  expression.  It  says  a  human  facial  expression  is  composed  of  46  muscular 
movements  called  AUs  (Action  Units).  Here,  a  mesh  model  of  a  human  face  is  defined  and  each  action 
unit  is  designed  to  deform  the  mesh  model  according  to  FACS  system.  Using  the  mesh  model,  one  can 
create  a  facial  expression  by  asserting  a  set  of  action  units  and  their  intensity  values.  In  this  paper,  we 
propose  a  method  using  GA  (genetic  Algorithm),  known  wide  as  a  function  optimizer,  to  extract  the 
muscular  information  of  an  arbitrary  facial  expression,  where  the  expression  may  be  input  by  3D 
measuring  device  or  2D  image.  By  using  the  precise  muscular  information,  we  can  construct  a  more 
realistic  facial  expressions  and  this  process  contributes  toward  creating  an  artificial  agent  inhabiting  the 
virtual  world,  a  virmal  agent. 


1.  Introduction 

We  receive  the  external  information  through  the  five  senses  and  most  of  the  information  is  received  by  the  visual 
sensing  system.  Tliere  may  be  many  methods  to  provide  visual  information  and  a  look  of  a  human  face  is  a  good 
example  of  the  means  to  express  a  mood  or  thought  of  the  person.  Many  meanings  can  be  transmitted  by  a  facial 
expression.  The  look  reflects  the  emotion  of  an  individual,  happiness,  anger,  sadness  and  many  other  delicate 
nuances  If  the  computer  expresses  its  internal  state  to  us  using  the  look  of  a  human  face,  we  may  construct  a 
more  familiar  interface  to  us.  The  machine  having  a  human  face  !  The  environment  for  us  to  commumcate  with 
machines  as  if  they  were  our  friends  !  This  is  a  human-computer  interface  we  are  pursuing  as  a  part  of  the  virtual 
reality  (VR)  research  at  Korea  Institute  of  Science  and  Technology  (KIST).  As  a  part  of  VR  research,  we  are 
investigating  for  an  artificial  agent  inhabiting  a  virtual  world.  The  agent  may  communicate  to  us  in  the  external 

world  by  human  facial  expression.  -  ■  i 

The  purpose  of  this  paper  is  to  present  an  experimental  result  in  finding  out  the  components  of  a  facial 
expression  automatically.  Each  component  is  called  an  action  unit,  which  defines  what  part  of  a  face  changes  and 
is  based  on  the  facial  action  coding  system(FACS)[l]. 

1.1  Facial  Action  Coding  System  (FACS) 

FACS  was  suggested  by  the  psychologists,  Paul  Ekman  and  Wallace  V.  Friesen,  in  order  to  describe  human  facial 
expressions  in  the  1970's.  According  to  FACS,  there  are  46  primitive  muscular  movements  in  a  human  face 
called  action  units  and  a  facial  expression  can  be  synthesized  by  the  combination  of  action  units.  For  example, 
the  expression  describing  happiness  is  composed  of  the  muscular  movements  which  raise  '  the  inner  eyebrows, 
pull  the  comers  of  lips,  and  drop  a  jaw.  Figure  1.  shows  appearance  changes  due  to  the  combinauon  of  action 
units  and  their  relative  strengths  (  or  intensities  ),  and  Table  1.  describes  the  kinds  of  action  units  and  err 
functions. 


(a)  Expressionless  face 


(b)  Happiness 


(c)  Sadness 


(d)  Surprise 


(b)  AU  combination  ;  1(0.4)  -r  12(0.4)  +26(0.4)  (c)  AU  combination  :  1(0.3)  +4(0.6)  +  15(0.4) 
(d)  AU  combination :  1(0.7)  +  2(0.7)  +  5(0.7)  +  26(0.7) 

Figure  1.  Appearance  changes  due  to  AU  combinations 


No.AU  Function 

I.  Inner  Brow  Raiser 
2  Outer  Brow  Raiser 

4.  ErowLowerer 

5.  U^r  Lid  Raiser 

6.  Cheek  Raiser 

7.  Lid  Tightener 

8.  Lips  Toward 

9.  Nose  Wrinkler 

10.  Upper  Lip  Raiser 

II.  Nasolabial  Furrow  Deepener 
12,  Lip,  Comer  Puller 


No.  AU  Function 

13.  Sharp  Lip  Puller 

14.  Dinipler 

13.  Lip  Comer  Depressor 

17.  Chin  Raiser 

18.  Lip  pucker 

19.  Tongue  Show 

20.  Lip  Stretcher 

21.  Neck  Tightener 

22.  Lip  Funneler 

23.  Lip  Tightener 

24.  Lip  Pressor 

Table  1 


No.  AU  Function 

25.  Lips  Parts 

26.  Jaw  Drop 

27.  Mouth  Stretcher 

28.  Lips  Suck 

29.  Jaw  Thrust 

30.  Jaw  Side w;iys 

31.  JawClencher 

32.  Bite 

33.  Blow 

34.  Puff 

35.  Cheek  Suck 

Action  Units 


No.  AU  Function 

36.  Tongue  Bulge 

37.  Lip  Wipe 

38.  Nostril  Dilator 

39.  Nostril  Compressor 

41.  Lid  Droop 

42.  Slit 

43.  Eyes  Closed 

44.  Squint 

45.  Blink 

46.  Wink 


1.2  Creation  of  Facial  Expressions 

If  one  has  the  data  of  one  basic  expressionless  face,  he  can  modify  the  expression  by  applying  several  action  units 
with  their  intensities  -  a  face  is  modeled  by  triangle  meshes  and  applying  an  action  umt  means  changing  the 
coordinates  of  the  corresponding  vertices  of  the  action  unit,  characterizing  the  muscular  movement  and  the  AU 
intensity  determines  the  degree  of  the  changes  of  the  coordinates.  One  can  create  all  facial  expressions  at  his  will 
if  he  knows  the  composing  elements  of  the  facial  expressions.  Unfortunately,  FACS  does  not  provide  a  theory 
how  to  combine  the  action  units  to  create  a  desired  facial  expression.  That  is,  in  order  to  create  a  facial 
expression,  one  must  search  for  a  right  combination  by  trial  and  error  and  the  search  process  is  very  time 
consuming.  For  example,  if  each  action  unit  has  10  intensities,  we  have  to  search  lO^o  combinations  per  facial 
expression  at  the  worst  case.  Here,  we  use  a  genetic  algoritlmi(GA)  to  search  for  the  target  expression[3]. 

The  search  space  of  the  problem  we  are  to  solve  is  very  large,  so  we  need  the  heuristic  optimal  search 
technique.  Besides,  it  is  a  multi-modal  function  and  may  be  discontinuous [5].  That  is,  it  has  so  many  local  peaks 
in  its  search  region,  which  makes  it  difficult  to  apply  a  simple  method  like  hiil-ciimbing,  which  is  a  direct 
method  of  calculus-based  search  technique  and  finds  the  global  minimum  or  maximum  only  in  convex  spaces,  to 
locate  the  global  peak.  Another  widely  used  method  in  optimization  is  the  simulated  annealing  method[6][7], 
which  offers  a  way  to  overcome  this  major  drawback  of  calcuius-based  method,  but  the  price  to  pay  is  a  huge 
computation  time  and  the  searching  process  is  basically  sequential  in  nature.  To  avoid  their  drawbacks,  we  apply 
GA  to  our  problem,  which  avoids  converging  into  the  local  maxima  by  giving  multiple  initial  values  and  can  be 
implemented  easily  in  parallel. 


1.3  Motivation 

-  Facial  mesh  data  from  face  sculpturing  robot  system(2I 


At  Expo'93,  held  in  Taejon,  Korea,  the  CAD/CAM  laboratory  of  Korea  Institute  of  Science  and  Technology 
demonstrated  a  face  sculpturing  robot  system  that  obtains  the  3D  contour  of  a  person's  face  using  a  projection 
type  multislit  beam  topography [2],  Tlie  facial  features  of  a  person  are  extracted  from  the  contour  of  the  face  by 
matching  the  model  face  in  Figure  3-a  with  the  measured  data  approximately  .  All  the  facial  features  are  defined 
a  priori  "for  the  model  face.  Then,  the  recognized  face  data  is  affine  transformed  at  the  feamre  level  to  show 
different  facial  expressions.  Each  transformation  was  defined  by  a  combination  of  action  units  and  their  relative 
strengths  a  priori  by  the  human  user.  From  the  transformed  data,  a  robot  program  sculpturing  the  person's  face  is 
automatically  o-enerated  with  different  cutting  conditions  for  individual  features  like  nose,  lips,  and  others. 

Tlie  z  vahie°of  a  mesh  point  is  calculated  by  a  linear  interpoladon  of  the  z  values  of  adjacent  grid  points 
determined  from  the  regularly  spaced  lines  paraUel  to  either  x  or  y  axis  and  measured  contour  data  of  a  face  as 
shown  in  Figure  2.  After  the  model  mesh  data  for  a  face  is  approximately  matched  with  a  Z-map  data,  additional 
venices  are  added  to  make  the  surface  of  a  face  smooth  and  expressions  look  namral  as  shown  in  Figure  3-b. 

-  Facial  Expression  Composer 

Tlie  modeling  of  a  human  expression  was  very  time  consuming  and  the  result  was  an  approximate  expression 
with  no  individual  differences.  Initially  a  facial  expression  composer  was  developed  to  input  various  action 
combinations  and  their  intensities  by  a  musical  chord.  This  is  achieved  by  mapping  each  key  in  the  MIDI 
keyboard  to  an  action  unit.  Tliereby,  a  MIDI  keyboard  player  can  generate  an  arbitrary  facial  expression  by 
playin-  a  musical  chord.  Unfortunately,  this  method  may  create  an  expression  with  no  corresponding  humaii 
facial  expression  As  another  research  issue,  this  system  was  extended  to  animating  the  change  of  a  faci 
expression  according  to  a  musical  chord.  A  facial  expression  represents  the  listener's  emotion  when  listemng  to  a 
music  changes  according  to  the  musical  passage.  Tliis  is  the  mapping  of  the  emotion  to  the  expression.  A  facial 
expres’sion  changes  according  to  the  musical  passage  when  the  mapping  between  musical  attributes  like  harmony 
tempos  and  pitches  are  mapped  to  human  emotions  which  are  in  turn  mapped  to  facial  expressions.  Tlie  number 
of  the  expression  type  used  in  that  research  is  nine  and  each  type  is  subdivided  into  several  sub_expressions 
depending  on  the  intensities  of  the  actions. 


Figure!.  Figures. 

Figure  2.  Conversion  of  measured  data  into  regular  mesh  data 

Figure  3.  Model  face  with  characteristic  points  and  Mesh  model  of  an  expressionless  face 


1.4  Problems 


Tlie  process  mentioned  above  has  limits  in  several  respects.  The  resultant  expressions  are  not  realistic  enou^i 
because  they  are  generated  by  a  transformation  that  is  defined  by  a  human  who  laboriously  generated  the  rule  by 
a  trial  and  error  for  a  generic  face.  When  applied  to  a  target  face,  it  can  only  approximate  the  target  person’s 
facial  expression.  Tliat  is,  the  synthesized  facial  expression  does  not  account  for  individual  differences.  In 
addition,  the  facial  expressions  are  quite  diverse,  for  example,  surprise  is  a  distinctive  expression  from  others 
which  may  be  anger,  disgust,  happiness,  etc.,  but  there  is  not  one  unique  expression  which  represent  the  emotion. 
We  open  our  eyes  and  mouths  wide  to  make  others  notice  that  we  are  surprised,  but  these  actions  are  not  all  the 
same.  So  the  number  of  facial  expression  we  can  imagine  is  enormous  and  it  is  not  likely  that  there  is  a  canonical 
set  of  primitive  facial  expressions  from  wiiich  all  facial  expressions  can  be  generated.  It  is  almost  impossible  to 
analyze  each  individual  and  search  for  the  right  combination  of  the  action  units  and  their  intensities,  which  will 
compose  the  expression,  one  by  one  depending  only  on  human  intuition.  Here,  we  investigate  how  to  find  the 
right  combination  automatically  by  applying  GA.  Genetic  algorithm  known  widely  as  function  optimizer  was 
su'ggested  first  by  Holland  in  1970's[9].  It  has  a  feature  of  robustness.  Tliat  is  ,  it  is  a  problem-independent 
algorithm  and  so  its  application  is  various.  Tlie  only  problem-dependent  procedure  is  an  evaluation,  which  is  a 
major  part  of  problem  modeling.  Genetic  algorithm  inspired  by  an  evolution  is  an  adaptive  method  which  can  be 
used  to  solve  the  optimization  and  searching [4]  [8]. 

When  tills  effort  is  successful,  an  arbitrary  human  expression  can  be  registered  to  the  system  and  the  system 
can  recognize  the  person’s  facial  expression  among  the  registered  facial  expressions 

2.  Searching  for  facial  expression  by  Genetic  Algorithm 

We  explain  the  procedure  of  applying  GA  to  finding  the  right  combination  of  action  units  and  their  intensities 
when  given  the  Z  map  measurement  data  of  an  arbitrary  facial  expression. 

An  objective  facial  expression  is  determined  and  its  vertex  data  are  obtained  by  measuring  the  contour  of  a  face. 
The  measuring  data  of  this  experiment,  which  was  used  by  the  face  sculpturing  robot  at  EXPO’93,  is  obtained  by 
using  a  projection  type  multislit  beam  topography.  Because  an  action  unit  acts  only  on  the  coordinates  of  mesh 
vertices  of  a  face,  these  measuring  data  must  be  converted  to  the  model  mesh  format.  By  using  characteristic 
points  that  are  identified  on  the  measuring  data  as  anchors,  the  model  mesh  is  transformed  to  fit  the  measurement 
data.  These  intermediary  data  are  used  for  an  evaluation  parameters.  The  combination  of  action  units  and  their 
intensities  causing  minimum  difference  of  the  vertex  coordinates  with  a  target  face  ,of  which  mesh  data  we 
already  know  ,  is  found.  The  target  data  obtained  by  trial  and  error  with  human  intuition  are  used  for  only  test, 
however,  if  the  mesh  data  of  a  target  face  are  replaced  by  ones  constructed  from  the  real-measured  3D  contour 
data  of  an  arbitrary  facial  expression  or  a  model  mesh  data  dynamically  matched  with  2D  picture  images,  we  can 
obtain  the  information  of  muscular  movements  for  arbitrary  realistic  expressions.  So  by  using  GA,  we  can  reduce 
our  problem,  synthesizing  facial  expressions  and  making  canonical  sets  of  facial  expressions  more  realistic,  to 
cost-minimizing  heuristic  search. 

2*1  Two  approach  methods 

The  goal  of  the  searching  process  is  to  find  the  parameters  causing  the  global  minimum  evaluation.  The 
procedure  of  genetic  algorithm,  the  optimal  search  techmque,  was  implemented  in  two  different  ways.  In  nature, 
genetic  algorithm  is  based  on  a  string  structure,  which  represents  the  chromosome  of  an  object  being  evolved. 
The  first  method  models  a  chromosome  with  a  bit  string  while  the  second  one  implements  the  combination  of 
action  units  with  an  integer  array.  The  first  implementation  using  bit  strings  is  a  typical  modeling  case  in  the 
genetic  algorithm.  Most  programs  with  GA  use  a  binary  bit  string  model  because  it  matches  Holland's  schema 
theorem  best.  However,  we  also  apply  another  method,  an  integer  model,  which  is  executed  with  the  problem 
domain  reduced  and  preliminary  information  of  the  problem  appended. 


2.2  Binary  bit  string  model 
-  Modeling 


In  the  binary  bit  string  model,  a  chromosome  consists  of  the  intensity  information  of  35  AUs  out  of  46  possible 
\Us  -  only  35  AUs  have  been  implemented  -  and  four  bits  are  aligned  to  a  AU  so  that  16  intensities  can  appe^. 
E^ch  bit  Shins  IS  implemented  by  a  gray  code  not  to  make  a  critical  disturbance  by  a  bit  mutation[3].  In  Figure  4, 
the  intensities  of  AUl  and  AU4  are  7  and  4  respectively. 
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Figure  4.  Cliromosome  model  using  binary  bit  string 

-  Evolutionary  process 

Tl,e  GA  procedure  approximd.el,  U  divided  mto  four  parrs,  evaiuatioru  seleorlou,  crossover  and  mutatron. 
.Evjilwatioii  function 

luMvrriiis  means  lire  population  lose  the  diversity  and  the  poss.bility  that  the  searching  process  pre-converges 
,“sd  ™  “or  nra^ma)  moreases.  However,  if  drversiry  inomases  .00  mrrch  rn  orher  .voids,  most  of  d 
nd^vSrhave  sinnlai  fitness  values,  the  evolutronarj  process  wril  be  excessively  iime-consuming  one  and  the 
of TsmZion  will  increase.  Therefore,  the  harmony  between  two  contrary  aspects,  dtversrty  ^d 
coLrcre^ce  is  necessary  In  our  experiment,  the  fitness  value  of  an  individual  is  set  to  the  square  of  a  raw 
tofnro’.Ol  is  addilo  an  exponent  at  every  generation  to  enlmge  the  differences  of  the  fitness  vtdues  among 

individuals  in  a  population  as  shown  in  Figure  3 . 

.  Selection,  Crossover  and  mutation 

The  process  of  reproduction  can  be  divided  into  selection  and  crossover.  The  selection  process  applies  the 
mixture  of  the  method  that  always  makes  the  fittest  remains  in  the  next  generation  and  fills  the  P°P^°^ 
next  ireneration  with  the  members  of  the  previous  generation  according  to  their  expected  values  of  the  selection 
and  tLt  of  a  roulette  wheel.  We  take  the  multiple  point  crossover  scheme  because 

insufficient  changes  for  the  chromosomes  of  this  experiment,  which  has  no  fewer  than  144  bits  as  its  genes. 

Sssorr^ils  two  aspects.  It  makes  two  crossover  chromosomes  exchange  each  good  nature  genes  ^d 
improves  their  characters  and  in  the  other  view  point,  it  gives  diversity  to  the  chromosomes  ^  modifymg  ffie 
oeLs  which  are  the  borders  during  the  exchange  process  of  the  genes.  For  our  chromosomes,  if  we  apply  smgie 
Sso^rschemroMy  one  among  35  AUs  is  affected.  After  many  generations  .  the  evaluaUon  converges 
gradually  to  some’poinl  but  in  many  times  it  is  trapped  to  the  local  minimum  or  local  maximm.  In  ^ 

ffiis  situation  we  m^t  provide  the  diversity  to  the  population  which  is  supplied,  m  most  cases,  by  mutation.  In  our 
Speimerrhave  cLsover  also  takeSie  role  actively  by  modifying  the  number  of  the  pomts  of  the  crossove 
cSamically  as  the  distribution  of  the  average  fitness  of  the  population  is  gradually  ^ 

restricted  rigion.  This  implementation  is  opposite  to  De  Jong's  opinionJlO]  -  an 

points  is  likely  to  result  in  a  random  shuffle  and  fewer  important  schemata  can  be  preserved.  However,  the 


experiment  showed  it  could  help  to  the  improvement  of  a  performance,  which  is  shown  in  Figure  6,  for  the 
increased  points  of  crossover  operations  were  used  for  increasing  diversity  when  a  population  was  pre-converging 
to  a  local  optima. 

A  mutauon  rate  is  adapted  to  the  distribution  of  the  population  at  each  generation.  .\s  the  normalized  average 
fitness  of  the  population  increases,  the  mutation  rate  will  increase.  However,  increasing  diversity  aggravate  the 
converging  process,  by  causing  the  evaluation  function  to  oscillate.  In  such  cases,  the  algorithm  will  be  similar  to 
a  random  search  and  makes  no  sense.  It  is  important  to  harmonize  those  two  aspects  of  the  total  process, 
diversity  and  stability,  to  make  the  evaluation  function  converge  into  the  global  minimum  or  maximum.  In  our 
experiment,  the  search  space  is  very  large  and  the  evaluation  function  is  multi-modal,  so  it  is  likely  to  be  trapped 
to  a  local  optima,  so  the  diversity  is  emphasized. 


Figure  5.  Fitness  scaling  Figure  6.  multi-point  vs.  single  point  crossover 


-  Convergence  :  experimental  result 


(e)  300  th  generaticm  (f)  target  e^qjressioii 

Figure  7.  Convergence  process  of  binary  bit  string  model 


Tlie  individuals  of  a  population  gradually  approach  to  an  adjacent  region  in  a  search  space  through  many 
•xenerations,  and  this  is  called  convergence.  Figure,  7  shows  the  convergence  process  when  the  size  of  a 
population  is  300  the  possibility  of  a  crossover  is  0.7,  a  mutation  rate  is  the  cube  of  a  normalized  fitness,  and  a 
selection  schedule’  is  the  mixture  of  methods  using  both  roulette  wheel  and  expectation  value  of  each  individual. 


2.3  Chromosome  modeling  using  an  integer  array 

Our  second  implementation,  an  integer  array  model,  has  a  result  similar  to  that  of  the  first  except  the  fast 
convergence  time.  It  is  because  we  reduced  the  search  space  by  limiting  the  number  of  AUs  within  a  chromosome 
to  15  because  that  in  most  instances  the  number  of  the  AUs  composing  a  facial  expression  is  less  than  ten  -  the 
search  space  are  reduced  from  2  to  35  C  ,3.  Different  from  the  bit  string  model,  its  crossover  points  are  the 
borders  of  the  AUs,  which  gives  no  modification  to  the  AU  parameters  so  that  it  has  to  mainly  depend  on  the 
mutation  for  its  modification.  So  the  mutation  rate  must  be  lugher  than  the  former's,  but  this  scheme  ^«ses  the 
distribution  of  the  evaluation  values  wide  and  the  average  evaluation  curves  to  oscillate  more  severely  This  effect 
is  obvious  when  comparing  the  convergence  process  of  average  fitness  m  Figure  8  with  that  of  igure  .  n 
addition  in  order  tliat  a  diversity  is  provided  from  crossover  operauon  ,too,  in  case  that  a  chromosome  has  the 
same  AU  as  its  gene,  the  intensity  of  the  AU  is  added  or  subtracted  by  the  value  Difference  *  Random_gen 
fMAK  INTENSITY)  /  MAX_INTENSrrY  with  the  probability  of  O.o.  Figure  8  and  Figure  9  show  the 
convergence  process  of  integer  anay  model  when  the  probability  of  chromosome  mutation  is  0.3,  AU  mutation 
probabriity  is  0.1,  the  probability  of  an  intensity  mutation  is  0.2  and  other  parameters  are  the  same  with  the 

former  method. 


2.4  Difficulties  in  convergence  process 

Tliere  is  a  problem  we  are  to  solve  in  both  the  cases  that  the  numerical  evaluation  data  do  not  completely 
converge  into  the  global  optimum  though  the  resultant  facial  expression  is  almost  the  s^e  ivith  the  target 
expression  when  we  compare  those  hvo  with  our  eyes.  It  is  guessed  that  the  fact  is  mainly  due  to  the  inu  U- 
m^^dality  of  the  evaluation  function.  To  solve  the  problem,  it  may  be  useful  to  apply  a  hybrid  method  of  e 
t^enetic  algorithm  with  the  local  search  techniques  such  as  liill-climbmg,  and  use  other  strategies,  for  examp  e 
adaptation  of  a  mutation  probability  to  a  given  environment  [9]or  sequential  niche  me&od  [0]  etc, 
that  apply  GA's  parallelism  to  the  sequential  methods  such  as  simulated  annealing [6] [7]  have  been  published  and 
those  concepts  will  also  be  heipfiil  to  the  oscillation  problem. 


a  Miiiinium_evaluation 
+  Average.evaluation 

Figure  8.  Evolutionary  process 


(e)  sooth  generatioii  (0  target  expression 

Figure  9.  Appearance  changes  throu^  generations 


3.  Conclusions 

It  is  necessary  to  find  out  the  combination  of  action  units,  the  components  of  a  facial  expression,  and  their 
intensities  for  each  expression  to  build  natural  looks.  Tliis  paper  presents  a  search  method  for  the  parameters  of 
the  composing  elements  of  the  facial  expressions  automatically.  The  facial  expressions  are  the  reflections  of  the 
human  emotion  and  one's  thought.  Therefore,  to  find  out  the  components  of  them  means  constructing  a  basis  for 
creating  a  medium  possessing  the  human  feeling  which  coimects  man  with  a  machine. 

Especially  in  VR  application,  constructing  a  realistic  facial  expression  is  an  indispensable  part  of  building  an 
artificial  agent  in  a  virtual  world  and  this  increases  the  realism  of  a  medium  which  links  a  real  world  with  a 
virtual  one.  It  makes  the  environment  of  the  man-machine  interface  more  familiar  to  man.  For  instance,  if  the 
facial  expressions  are  used  as  means  of  communication  between  the  guide  machine  and  the  users,  it  can  eliminate 
the  user's  feeling  of  rejection  which  often  arises  when  we  face  the  machine.  A  facial  expression  is  an  effective  tool 
for  a  man  to  communicate  with  the  other  humans,  can  be  used  to  represent  an  inner  state  of  a  system  whose 
operation  state  is  difficult  to  estimate  and  understand  in  a  single  glance. 

In  addition,  in  the  view  point  of  information  transmission,  it  helps  to  save  the  amount  of  data  to  be  transferred. 
When  it  is  necessary  to  transfer  facial  expressions,  if  all  their  vertex  data  or  the  differences  of  the  coordinates  of 
the  vertices  between  them  are  to  be  transferred,  there  will  be  a  large  amount  of  data  and  this  may  cause 
congestion  in  the  communication  chaimel  even  with  advanced  image  compression  methods  like  MPEG  H.  So  if 
we  can  transfer  only  control  parameters  of  the  action  units  to  change  facial  expressions  with  one  basic  expression, 
we  can  expect  a  large  data  compression.  Moreover,  our  system  can  be  modified  to  be  used  to  analyze  the 
psychological  state  of  a  man.  If  the  data  of  a  facial  expression  are  given  ,  we  can  extract  the  components  of  an 
expression  and  infer  the  emotional  state  of  the  expression. 

With  this  purposes  in  mind,  this  research  focuses  on  generating  facial  expressions  at  our  will  by  implementing 
AUs  which  will  describe  the  motion  of  the  muscles  of  man  more  accurately. 


I 
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Abstract 

This  paper  presents  an  implemented  system  for  automatically  producing  prosodically  appropriate  speech  and 
corresponding  facial  expressions  for  animated,  three-dimensional  agents  that  respond  to  simple  database  queries  m 
a  3D  virtual  environment.  Unliite  previous  text-to-facial  animadon  approaches,  the  system  descnbed  here  produces 
synthesized  speech  and  facial  animadons  entirely  from  scratch,  starting  with  semantic  representations  of  toe 
to  be  conveyed,  which  are  based  in  turn  on  a  discourse  model  and  a  small  database  of  facts  about  toe  modeled  world. 


1  Introduction 

As  research  on  the  simulation  of  autonomous  virtual  human  agents  progresses,  two  major  issues  in  human-machine 
interaction  must  be  addressed.  First,  proper  intonation  is  necessary  for  conveying  the  information  structure  of  utter^ces 
with  respect  to  the  underlying  discourse  structure,  expressing  important  distinctions  of  contrast  and  focus  ([27],  [24], 
PS])  Realistic  facial  expressions  and  lip  movements  help  in  providing  relevant  information  about  discourse  structme, 
ton-taking  protocols  and  speaker  attitudes  ([8],  [9],  [18]).  Moreover,  in  a  face-to-face  conversation,  facial  displays 

play  an  important  communicative  role.  _ 

Simulating  this  communicative  role  for  animation  requires  symbolic  specification  of  the  semantics  and  pragmatics 
of  movements.  Faces  change  expressions  continuously,  and  many  of  these  changes  are  synchronized  with  what  is 
aoint’  on  in  concurrent  conversation.  Facial  expressions  are  linked  to  the  content  of  speech  (scrunching  one  s  nose 
ken  talking  about  something  unpleasant)  as  well  as  affect  (smiling  when  remembering  a  happy  event),  -^ey  cm 
replace  sequences  of  words  (e.g.  “the  food  was  [wrinkle  nose,  stick  out  tongue]”)  as  well  as  accompany  them  [9], 
and  they  can  serve  to  help  disambiguate  what  is  being  said  when  the  acoustic  signal  is  degraded.  They  do  not  occur 
randomly  but  rather  are  synchronized  to  one’s  own  speech,  or  to  the  speech  of  others  [6],  [15],  It  is  therefore  important 
that  the  specification  of  facial  expressions  takes  many  different  levels  of  organization  into  account.  We  propose 
that  integrating  models  for  generating  proper  intonation  and  facial  expressions  will  improve  the  intelligibility  and 
naturalness  of  utterances  produced  by  meaning-to-speech  systems  as  well  as  by  more  elaborate  systems  involving 

virtual  animated  human  agents  (e.g.  [3]).  ^  i.- 

Tne  intonation  generation  model  is  based  on  Combinatory  Categorial  Grammar  (CCG  -  cf.  [27]),  a  tormalism 
which  easily  integrates  the  notions  of  syntactic  constituency,  prosodic  phrasing  and  information  structure.  Based  on 
the  CCG  grammar,  a  simple  discourse  model  and  a  small  knowledge  base  represented  in  Prolog,  the  system  produces 
spoken  rkonses  to  database  queries  with  appropriate  intonation.  Given  the  precise  timings  for  phonemes  and 
intonational  phenomena  in  the  speech  wave,  we  produce  precise  specifications  for  generating  the  Up  movements  imd 
facial  expressions  for  a  graphical  model  of  a  human  head.  Results  from  our  current  implementation  demonstrate  the 
system’s  ability  to  generate  a  variety  of  intonational  possibilities  and  facial  animations  for  a  given  sentence  depending 

on  the  discourse  context.  ^ 

Previous  work  in  the  area  of  intonation  generation  includes  studies  by  Terken  ([29]),  Houghton  and  Pearson  ([13]), 
Isard  and  Pearson  ([14]),  Davis  and  Hirschberg  (cf.  [7],  [12]).  and  Zacharski  ei  al.  ([31]).  Benoit  et  al.  ([1]), 

•We  would  like  to  toank  particularly  Dr.  Notman  I.  Badler  end  Dr.  Mark  Steedman  for  their  very  useful  comments.  We  are  grateful  to  AT&T 
BeU  Laboratories  for  allowing  us  access  to  toe  TTS  speech  synthesizer,  and  “ 

advice  on  its  use.  The  usual  disclaimers  apply.  TheresearchwassupportedmpartbyNSFgrantnos_lW90-18513.IRI9 

CISE  IIP-CDA-88-22719,  DARPA  grant  no.  N00014-90-J-1863,  and  ARO  grant  no.  DAAL03-89-C0031. 
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Prosodical/y  Annotatad  Question 


Spoken  Response  Graphic  Output 


Figure  1:  Architecture 


Brooke  ([2]),  Cohen  er  a/.  ([4]),  Hill  a/.  ([11]),  Lewis  er  a/.  ([16])  and  Terzopouios  er  a/.  ([30])  have  worked 
on  synchronizing  lip  movements  with  speech,  producing  quite  striking  results.  Takeuchi  et  al  ([28])  implemented  a 
user-interface  in  which  a  3D  facial  model  responds  to  queries  posed  by  a  user.  In  this  system,  the  generation  of  the 
facial  expressions  accompanying  the  answer  depends  on  an  analysis  of  the  conversational  situation  and  the  selection 
of  facial  expressions  from  a  database  of  facial  displays. 

The  system  described  here  expands  the  work  of  the  aforementioned  researchers  by  linking  contextually  appropriate 
intonation  with  the  corresponding  facial  expressions,  and  generating  the  3D  facial  animations  automatically  from 
semantic,  information  structural  and  discourse  structural  representations  [21]. 


2  The  Implementation 

Using  the  CCG  theory  of  prosody  outlined  in  [27],  [24]  and  [25],  the  implemented  system  undertakes  the  task  of 
specifying  contextually  appropriate  intonation  and  facial  animation  for  spoken  responses  to  database  queries.  The 
process,  which  is  illustrated  in  figure  1,  begins  with  a  fully  segmented  and  prosodically  annotated  representation 
of  a  spoken  query,  as  shown  in  example  (1),  which  involves  a  simple  database  of  facts  about  stereo  components. 
The  notationai  system  representing  the  intonation  contour  in  example  (1)  is  an  adaptation  of  the  widely  used  system 
developed  by  Pierrehumbert  ([23]).^  For  simplicity,  we  show  accented  words  in  capital  letters  without  regard  for  the 
different  possible  types  of  accents.  A  simple  CCG  parser  determines  the  semantics  of  the  question,  dividing  it  into  its 
theme,  which  identifies  what  the  sentence  is  about,  and  its  rheme,  which  identifies  what  is  important  or  salient  about  the 
theme.  We  refer  to  this  division  of  the  utterance  into  theme  and  rheme  as  its  information  structure.  Certain  elements  of 
the  theme  and  rheme  may  be  particularly  salient  because  they  are  new  to  the  discourse  or  serve  to  distinguish  among 
entities  or  propositions  that  are  already  firmly  established  in  the  discourse.  We  say  such  items  are  in  focus,  and  mark 
them  with  the  *  operator,  as  shown  in  examples  (2).^ 

(1)  I  know  which  components  produce  muddy  bass, 
but  WHICH  components  produce  clean  bass? 

L+H*  LH%  H*  LL$ 


^The  L+H'*'  and  markings  represent  different  types  of  pitch  accents  in  the  fundamental  frequency  contour.  The  LH%  and  LLS  markings 
represent  prosodic  boundaries.  For  a  brief  explanation  of  the  Kerrehumbert-style  markings,  see  [26]. 

^  A  full  explanation  of  the  semantic  and  syntactic  representation  in  (2)  is  beyond  the  scope  of  this  paper.  The  interested  reader  should  refer  to 
[27]  and  [26]. 
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(2)  Proposition: 

S'  :  Xx.coTnponent{x)&produce(x,^clean{h(iss)) 

Theme: 

s  :  Xx.component{x)&produce{x,*clean{bass))/ 

(s  :  pro(luce{x,  i^clean{hass))\np  :  x) 

Rheme: 

s  :  produce{x,  ^clean{hass))\np  :  x 

The  content  generation  module  has  the  task  of  determining  the  semantics  and  information  structure  of  the  response, 
marking  focused  items  based  on  the  contrastive  stress  algorithm  described  in  [25].  For  the  question  given  in  (1),  the 
strategic  generator  produces  the  representation  for  the  response  shown  in  example  (3),  where  the  appropriate  theme  can 
be  paraphrased  as  “what  produces  clean  bass^^  the  appropriate  rheme  as  “ampliflers^^  and  where  the  context  includes 
alternative  components  and  audio  qualities. 

(3)  Proposition: 

s  :  produce{*a7rLplifiers,  *clean{bass)) 

Theme: 

s  :  produce{x ^  ^clean(hass))\np  :  x 

Rheme: 

np  :  ^amplifiers 

Using  the  output  of  the  content  generator,  the  CCG  generation  module  (described  in  [24])  produces  a  string  of 
words  and  Pierrehumbert-style  markings  representing  the  response,  as  shown  in  example  (4). 

(4)  AMPLIFIERS  produce  CLEAN  bass. 

H*  L  L+H*  LHS 

The  final  aspect  of  speech  generation  involves  translating  such  a  string  into  a  form  usable  by  a  suitable  speech 
synthesizer.  The  current  implementation  uses  the  Beil  Laboratories  ITS  system  [17]  as  a  post-processor  to  synthesize 
the  speech  wave  and  produce  precise  timing  specifications  for  phonemes.  The  duration  specifications  are  then 
automatically  annotated  with  pitch  accent  peaks  and  intonational  boundaries  in  preparation  for  processing  by  the  facial 

expression  rules  (see  also  [3]).  ,  t-  •  . 

Most  facial  animation  systems  use  the  Facial  Action  Coding  System  (FACS),  developed  by  Ekman  and  Friesen 
[10],  to  annotate  facial  action.  The  system  describes  the  visible  muscular  action  based  on  anatomical  studies,  using 
basic  elements  called  action  units  (AU),  which  refer  to  the  contraction  of  one  muscle  or  a  group  of  related  muscles.  A 
facial  expression  is  described  as  a  set  of  AUs. 

Certain  facial  expressions,  which  serve  informational  structural  functions,  accompany  the  flow  of  speech  and  are 
synchronized  at  the  verbal  level.  Facial  movements  (such  as  raising  the  eyebrows  or  blinking  while  saying  “amplifiers 
produce  CLEAN  bass”)  can  appear  during  accented  syllables  or  pauses.  These  function  are  based  on  the  following 
determinants:  conversational  signals,  punctuators  and  manipulators.  Conversational  signals  correspond  to  movement 
occurring  on  accented  or  emphatic  items  to  clarify  or  support  what  is  being  said.  These  can  be  eyebrow  movements 
(the  most  commonly  used  facial  expression),  head  nods,  or  blinks.  Punctuators  are  movements  which  occur  on  pauses, 
reducing  the  ambiguity  of  the  speech  by  grouping  or  separating  sequences  of  words  into  discrete  unit  phrases  [5]. 
Slow  head  movement,  blinks,  or  a  smile  can  accompany  a  pause.  Manipulators  contspond  to  biologically  necessary 
functions  like  blinking  to  wet  the  eyes. 

As  we  have  seen,  a  facial  expression  can  have  a  variety  of  different  meanings  (e.g.  accentuating  an  element, 
punctuating  a  pause).  We  propose  a  high  level  programming  language  to  describe  them,  amounting  to  a  formal 
notation  for  the  different  clusterings  of  facial  expressions.  Indeed,  rather  than  using  a  set  of  AUs  to  specify  facial 
expressions  in  terms  of  intonational  features  in  speech,  it  is  more  convenient  to  express  them  at  a  higher  level,  directly 
denoting  their  function.  These  operations  are  then  mapped  onto  sequences  of  AUs  so  that  we  are  able  to  model  different 
facial  “styles”,  in  the  sense  that  people  differ  in  their  way  of  emphasizing  a  word  and  in  the  number  of  facial  displays 
they  use.  For  example,  Ekman  [9]  found  that  most  people  use  raised  eyebrows  to  accompany  an  accent  while  the  actor 
Woody  Allen  uses  eyebrow  positions  (inner  and  downward)  which  generally  imply  sadness. 

Our  algorithms  incorporate  synchrony  ([6]),  create  coarticulation  effects,  emotional  signals,  and  eye  and  head 
movements  ([19],  [20]).  The  facial  animation  system  scans  the  input  utterances  and  computes  the  associated  movements 
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for  the  lips,  the  conversational  signals  and  the  punctuators.  Conversational  signals  start  and  end  with  the  accented  word. 
For  instance,  on  amplifier,  the  brow  starts  raising  on  ‘a’,  remains  raised  until  the  end  of  the  word,  and  ends  raising  on 
‘r’.  On  the  other  hand,  the  punctuator  signals,  such  as  smiling,  coincide  with  pauses.  Blinking  is  synchronized  at  the 
phoneme  level,  due  to  biological  necessity,  accentuation  or  pausing.  On  amplifier,  for  example,  the  eyes  start  closing 
on  ‘a',  remain  closed  on  ‘m’  and  start  opening  on  ‘p’. 

The  computation  of  the  lip  shape  is  done  in  three  passes.  First,  phonemes,  which  are  characterized  by  their  degree 
of  deformability,  are  processed  one  segment  at  a  time  using  the  look-ahead  model  to  search  for  the  proximal  deformable 
segments  whose  associated  lip  shapes  influence  the  current  segment.  For  example,  in  amplifier  the  ‘1  receives  the 
same  lip  shape  as  the  following  vowel  ‘i’— that  is,  the  movement  of  the  ‘i’  begins  before  the  onset  of  its  sound.  Second, 
the  spatial  properties  of  muscle  contractions  are  taken  into  account  by  adjusting  the  sequence  of  contracting  muscles 
when  antagonistic  movements  succeed  one  another  (i.e.  movements  involving  very  different  lip  positions,  such  as 
pucker  movements  versus  the  extension  of  the  lips).  And  finally,  the  temporal  properties  of  muscle  contractions  are 
considered  by  determining  whether  a  muscle  has  enough  time  to  contract  before  (or  relax  after)  the  surrounding  lip 

shape.  ....  ,  .  , 

The  tongue,  although  not  highly  visible,  is  an  important  element  of  distinction  between  phonemic  elements, 

especially  when  these  elements  are  not  differentiated  by  their  lip  shapes.  The  tongue  is  composed  of  2  parallel 
surfaces,  each  of  them  made  of  10  triangles.  A  tongue  shape  is  defined  by  varying  the  tongue  parameters,  including 
the  length  of  the  edges  of  the  triangles  and  the  angles  between  each  of  the  edges.  Modifying  the  length  of  the  edges 
allows  for  the  narrowing,  flattening,  stretching  and/or  compression  of  the  tongue,  while  changing  the  value  of  the 
angles  between  edges  allows  the  tongue  to  bend,  curve  and/or  twist.  This  model  is  a  simplification  of  [22]. 


3  Examples 

In  the  examples  shown  below,  the  speaker  manifests  different  behaviors  depending  on  whether  s/he  is  asking  a  question, 
making  a  statement,  accenting  a  word  or  pausing.  When  asking  a  question,  the  speaker  raises  the  eyebrows  and  looks 
up  slit’htly  to  mark  the  end  of  the  question.  When  replying,  or  when  turning  over  the  floor  to  the  other  person,  the 
speaker  turns  the  head  toward  the  listener.  To  emphasize  a  particular  word,  s/he  raises  the  eyebrows  and/or  blinks. 
During  the  brief  pauses  at  the  end  of  statements  and  within  statements,  the  speaker  blinks  and  smiles. 

(5)  I  know  which  amplifier  produces  clean  bass, 

but  which  amplifier  produces  clean  treble? 

L+H*  LH%  H*  LLS 

The  BRITISH  amplifier  produces  clean  treble. 

H*  L  L+H*  LHS 

(6)  I  know  which  British  component  produces  muddy  ueble, 

but  which  British  component  produces  clean  treble? 

L+H*  LH%  H*  LLS 

The  British  amplifier  produces  clean  treble. 

H*  L  L+H*  LHS 

In  utterance  (5),  the  word  British  is  accented  and  accompanied  by  a  raised  eyebrow,  which  indicates  a  conversational 
si'^nal  denoting  contrast.  In  utterance  (6),  on  the  other  hand,  the  word  amplifier  is  accented  and  marked  by  the  action 
of“the  eyebrows  and  a  blink  (see  figure  2).  The  same  argument  differentiates  the  appearance  of  the  movement  on  the 
word  treble  in  (5)  and  the  word  clean  in  (6).  Moreover,  a  puncmating  blink  marks  the  end  of  (6),  starting  on  the  pause 
after  the  word  treble.  In  (5)  a  blink  coincides  with  the  accented  word  treble  (as  a  conversational  signal)  and  with  the 
pause  marking  the  end  of  the  utterance  (as  a  punctuator),  resulting  in  two  blinks  emitted  in  succession  at  the  end  of 
the  utterance.  In  both  examples,  the  pause  between  the  two  intonational  phrases  ‘the  British  amplifier’  and  ‘produces 
clean  treble’,  is  accompanied  by  movement  of  the  eyebrows. 
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Figure!:  ‘amplifier’ 


4  Conclusions 

The  system  described  above  produces  quite  sharp  and  natural-sounding  distinctions  of  intonation  contour,  as  well  as 
visually  distinct  facial  animations,  for  minimal  pairs  of  queries  and  responses  generated  automaUcally  from  a  discourse 
model  and  a  simple  knowledge  base.  The  examples  in  the  previous  section  (and  others  presented  at  the  workshop) 
illustrate  the  system’s  capabilities  and  provide  a  sound  basis  for  exploring  the  role  of  intonation  and  facial  expressions 
in  a  3D  virtual  environment.  Future  areas  of  research  include  evaluating  results  and  exploring  the  relevance  of  our 
current  system  to  large  scale  animation  systems  involving  autonomous  virmal  human  agents  (cf.  [3]). 
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1  Introduction 

The  Praxitele  f  of  autonomous  motion 

SiiMESSSSSSilEE! 

"^Uxisting  driving  simulators  [21,  28,  13,  11,  17]  differ  in  two  points  with  the  Praxitele  Project  requi 
rements: 

.  They  are  all  realized  to  integrate  a  real  driver  in  the  simulatmn  loop  and  all  the  simulation  is  based 
on  this  aspect  (movement  restitution,  realistic  driving  interface,  ...). 

•  The  simulated  environment  is  not  urban. 

The  Praxitele  project  is  presented  in  more  details  in  the  next  section.  The  section  3  Presents  a 

the  last  section  we  will  discuss  about  the  evolution  of  this  work. 


2  Aims  and  goals  of  the  Praxitele  Project 

^otglesliNtE^^ Ihe  other  il  compSr  sdeTceTd  aToratiraNSjUn  totpeSon  with  large 

“  j  ®  t  ITiFNAULT  EDF  CGEA).  This  project  designs  a  novel  transportation  system  ba 

teroTsTa“  pubfc  =r;nd„  3«k,v  .  ce.P.al  «.mp«t..  (24).  Thepe  public  ecu 

S  .  “Tb  opecliou  cauL  automated  in  spec.dc  The 

the  snbiecl  of  kp.timenW  seveial  times  but  with  poot  results.  The  failure  of  these 
raihrVem. cooperative  drivine  cf  «  Pl“»°« 
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vehicles,  only  the  first  car  is  driven  by  a  human  operator  [23].  This  function  is  essential  to  move  easily 
the  empty  vehicles  from  one  location  to  another. 

Before  first  experiments  of  real  automated  cars  in  real  cities,  we  have  to  design  and  implement  a 
simulation  platform.  This  platform  permits  to  simulate  a  platoon  of  vehicules  evolving  in  a  virtual  urban 
environment  and  so  to  test  control  algorithms  of  the  automated  cars. 


3  A  Simulation  Platform 

Motion  control  models  are  the  heart  of  any  simulation  system  that  determines  the  friendliness  of  the 
user  interface,  the  class  of  motions  and  deformations  produced,  and  the  application  fields.  Motion  control 
models  can  be  classified  into  three  general  families  :  descriptive,  generative  and  behavioral  models  [14]. 
Descriptive  models  are  used  to  reproduce  an  effect  without  any  knowledge  about  its  cause,  thus  a  subset  of 
instantaneous  states  are  expressed  either  absolutely  or  relatively  over  time,  and  by  interpolation  a  spatio- 
temporal  trajectory  is  obtained  in  the  system  description  space.  Unlike  preceding  models,  generative 
models  are  interested  by  a  causal  description  of  objects  movement  (describe  the  cause  which  produces 
the  effects)  for  instance  their  mechanics.  The  goal  of  behavioral  models  is  to  simulate  organisms  (plants) 
[8,  25]  and  living  beings  (animals  and  persons)  [2,  3],  their  action  and  their  response  to  stimulation. 
One  goal  of  behavioral  models  is  to  provide  the  user  with  a  higher  level  control  of  movement.  With  the 
intention  of  making  these  three  kinds  of  models  work  together,  we  are  interested  by  their  integration  into 
a  same  simulation  platform.  This  integration  of  the  three  kinds  of  models  in  the  same  platform  permits 
to  offer  to  each  dynamic  entity  a  more  realistic  and  richer  environment,  and  thereby  will  increase  possible 
interactions  between  an  agent/actor  and  its  environment. 

The  simulation  platform  is  composed  of  a  set  of  agents/ actors  whose  synchronization  and  commu¬ 
nication  are  managed  by  a  real-time  kernel  (cf  figure  1).  The  main  part  of  this  kernel  is  the  general 
controller.  Communication  between  the  agents  is  both  synchronous  and  asynchronous.  The  synchronous 
part  is  data-flow  based  where  each  agent  has  its  own  frequency  and  is  managed  by  the  general  controller. 
So,  the  data-flow  communication  channels  include  all  the  mechanisms  to  adapt  to  the  local  frequency  of 
the  sender  and  receiver  agents  (over-sampling,  sub-sampling,  interpolation,  extrapolation,  etc...  ).  The 
asynchronous  part  is  based  on  event  based  communication  between  agents  and  the  general  controller. 


A.V/ ;  Animaiion  Module 

LC:  Local  Controller 

CC :  Global  Controller 

DM:  Dataflow  Manager 

EM:  Event  Manager 

CS:  Communication  Server 

TSIF :  Temporal  Manager  of  Processes 

Agent  =  AM  +  LC 


Figure  1:  layers  of  the  logical  architecture  of  the  platform 


Time  is  the  most  important  element  of  our  simulation  platform  either  during  the  specification  or 
during  the  execution  phase.  The  figure  2  show  a  structural  view  of  two  communicating  agents  in  the 
platform.  At  each  component  is  associated  a  specific  task: 

Global  controller:  it  is  the  centralised  controller  of  an  animat  ion/simulation.  It  performs  the  sche¬ 
duling  of  the  agents  execution  in  order  to  respect  the  real  time  constraint.  It  is  responsible  of 
the  initialization  and  dynamic  configuration  of  the  set  of  agents  through  communications  with  the 
agent  using  events.  The  dynamic  configuration  depends  of  events  generated  by  the  agents  or  of  the 
analysis  of  an  external  script  describing  the  animation/simulation. 
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Local  controller:  this  controller  has  to  manage  the  communications  by  events  to  the  global  control¬ 
ler  and  by  data-flow  to  the  other  agents.  The  temporal  control  of  the  animation  module  is  also 
performed  by  this  controller  according  to  the  global  controller  directives. 

Animation  module:  this  module  is  the  effective  computation  module  performing  the  animation  task 
as  user  interaction,  physically  based  models  calculation,  trajectories  application,  image  synthesis 
and  so  on.  Each  animation  module  has  a  local  frequency  according  to  its  functionality. 

Communication  channel:  the  communication  channel  connects  two  agents  with  potentially  different 
local  frequencies.  It  has  to  adapt  the  data-flow  communication  according  to  these  known  frequencies 
by  temporally  stamping  messages  between  agents. 


Figure  2:  structural  view  of  the  platform 


4  Simulation  of  an  Urban  Environment 

An  urban  environment  is  composed  of  many  dynamic  entities  evolving  in  a  static  scene.  These  Jnamic 
entities  have  to  be  both  autonomous  and  controllable  and  also  realistic.  It  is  necessary  to  combine  the 
three  motion  control  models  to  describe  dynamic  entities  of  the  environment.  For  example  to  describe 
traffic  lights  it  is  not  necessary  to  use  a  generative  model  when  a  descriptive  model  (finite  state  automata) 
is  sufficient  On  the  other  hand,  for  a  realistic  car  driving,  we  need  both  generative  and  behavioral  models 
(the  first  one  to  simulate  the  dynamic  of  the  vehicle  and  the  second  one  to  simulate  the  driver). 


4.1  The  static  scene 

As  we  want  to  control  entities  evolving,  we  need  to  link  dynamic  entities  with  the  static  scene  in  which  they 
are  moving.  This  link  requires  a  semantic  knowledge  on  the  scene.  If  we  want  to  simulate  ^  completly  as 
possible  the  life  of  a  city,  we  need  a  lot  of  semantic  informations.  For  the  car  driving  simulation  examp  e 
we  are  particularly  interested  by  the  life  in  the  streets  and  less  by  what  is  happening  inside  buildings  (cf 
3).  Informations  required  for  the  simulation  are: 

L  Geometric: 


•  the  geometry  of  the  town, 

•  roadsigns, 


2.  Topologic: 
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e  the  road  network  (network  of  trajectories), 

•  a  visibility  grid, 

3,  Semantic: 

•  road  informations:  roadsigns,  color  of  traffic  lights,  qualitative  aspect  of  the  road,  ... 

•  city  informations:  name  of  streets,  quarters,  particular  buildings,  squares,  ... 

To  describe  such  a  scene,  an  urban  modeling  system  is  currently  in  development.  This  system  is 
an  extension  of  the  Scriptography  (Declarative  Design  System)  [10],  in  which  geometric,  topologic  and 
semantic  informations  are  mixed. 


Figure  3:  Geometric,  Topologic  and  Semantic  Informations 


4.2  Dynamic  entities 

To  take  into  account  natural  phenomena,  the  first  work  is  to  choose  a  physical  model  to  represent  the 
object.  From  a  high  level  description  of  articulated  rigid  body  systems,  a  simulation  blackbox  is  generated 
whose  inputs  are  torques  and  outputs  are  position  and  orientation  parameters. 

We  have  now  to  determine  how  to  control  this  physical  model  that  is  to  say,  depending  on  the  actual 
state  of  the  entity  what  kind  of  torque  must  we  apply  to  it  to  obtain  the  desired  motion  ?  In  the  case  of 
an  automatic  motion  control,  this  question  can  be  decomposed  in  two  parts: 

1.  how  control  the  physical  model? 

2.  what  is  the  desired  motion? 

The  answer  to  the  first  question  consists  in  using  motion  control  algorithms  [19]  well  known  in  the 
automatic  and  robotic  communities,  which  can  permit  to  build  a  library  of  elementary  actions.  The 
behavioral  model  try  to  answer  to  the  second  question  by  defining  actions  and  reactions  of  an  entity 
[31,  9,  16].  The  behavioral  model  is  based  on  two  kinds  of  relationships  between  the  object  and  its 
environment:  perception  and  action  (cf  figure  4). 

Different  approaches  have  been  studied  for  the  decision  part:  Sensor-Effector  [32,  34,  30],  Behavior 
Rule  [6,  15,  22,  26,  29,  33,  31],  Predefined  Environment  [7,  27]  and  State  Machine  [4].  The  general 
statement  about  actual  systems  using  these  different  approachs  is  that  they  are  ad  hoc  models  designed 
to  be  applied  in  some  particular  cases,  in  which  the  simulated  environment  is  very  simple.  Possible 
interactions  between  an  object  and  its  environment  are  very  simple,  and  sensors  and  actuators  are  reduced 
to  minimal  capabilities  which,  most  of  the  time,  permit  only  to  avoid  obstacles  in  a  2D  or  3D  world. 
Another  conclusion  is  that  none  of  these  models  are  able  to  take  into  account  the  explicit  management 
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Environment 
Dehavioral  Entity 


- - Decision  j . 


Figure  4;  A  behavioral  agent  immersed  in  its  environment 


of  time,  either  during  the  specification  phase  (memorization,  prediction,  action  duration,  etc.)  or  during 
the  execution  phase  (synchronization  of  objects  with  different  internal  times). 

We  have  chosen  to  define  a  multi-agent  system  [12,  18]  in  order  to  implement  a  cooperation  between 
the  different  behavioral  approaches  in  one  decisional  model.  So,  this  decisional  model  is  decomposed  m 
a  set  of  specialized  agents  who  use  themselves  some  experts  before  proposing  their  diagnostic  to  the 
decisional  agent  (supervisor).  The  work  of  this  supervisor  is  to  integrate  all  the  local  decisions  according 
to  the  desired  behavior  of  the  system.  The  supervisor  is  principally  constituted  of  hierarchical  paraUel 
automata,  whose  transitions  depend  on  sensor  data  and  on  event  generated  by  lower  level  agents.  The¬ 
refore  the  supervisor  decides  to  activate  some  of  the  specialized  agents  and  this  decision  depends  on  its 
own  state  and  on  sensor  data  (cf  figure  5).  The  use  of  hierarchical  parallel  automata  allows 
into  account  concurrency  and  abstraction  in  the  description  of  the  behavior,  like  Kearney  et  al.  with  their 
hierarchical  concurrent  control  state  machines  [16].  Because  of  the  integration  of  our  behavioral  mode  m 
a  simulation  platform,  we  have  also  the  ability  to  deal  with  a  real  time  during  the  specification  and  the 

execution  phase. 


Figure  5:  One  point  of  view:  one  car  is  stopping  at  the  crossroad  because  the  traffic  light  is  red,  another 
one  is  now  out  of  the  corner. 
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5  An  Automatically  Driven  Dynamic  Car 

A  vehicle  is  an  articulated  rigid  object  structure  if  we  do  not  consider  the  deformations  of  the  tires  and 
the  component  flexibility  [l].  The  vehicle  is  defined  by  a  generative  model  which  is  parametrized  by  a 
state  vector  and  two  torques  (motor  and  guidance).  In  the  case  of  an  automatic  control  of  the  vehicle,  we 
have  to  describe  the  behavior  of  a  virtual  driver  depending  on  how  is  its  perception  of  its  environment. 
A  feedback  state  control  algorithm  [20]  determines  what  torques  are  applied  to  the  vehicle  from  actions 
decided  by  the  virtual  driver. 


Figure  6:  Hierarchical  structure  of  the  driver  decisional  model. 

The  decisional  module  has  to  make  the  choice  of  what  kind  of  action  to  perform,  depending  on  its 
actual  state  and  on  its  own  perception  of  its  environment  (cf  figure  6).  This  operation  is  decomposed  in 
six  stages: 

1.  The  supervisor  (the  driver  module)  reads  received  messages  and  makes  an  analysis  of  data  received 
from  sensor(s). 

2.  The  supervisor  activates  specialized  agents. 

3.  Execution  of  specialized  agents  (itinerary,  road  signs,  obstacle  detection  and  state  feedback  control) 
which  can  also  activate,  if  necessary,  more  specialized  agents  (here  the  obstacle  detection  module 
can  activate  both  the  moving  obstacles  module  and  the  stationary  obstacles  module). 

4.  The  supervisor  analyses  diagnosis  of  specialized  agents. 

5.  The  supervisor  decides  to  act. 

6.  Actions, 

Stages  two,  three  and  four  are  corresponding  to  the  calculation  of  the  new  state  of  the  world  from 
the  point  of  view  of  the  agent.  To  deal  with  complex  and  concurrent  behaviors,  stage  five  is  performed  by 
hierarchical  parallel  automata  (cf  figure  7).  Transitions  on  automata  are  functional  expressions  depending 
on  the  actual  state  of  the  world.  Each  state  of  each  automaton  is  either  an  automaton  itself  or  an 
elementary  state.  Each  elementary  state  has  in  charge  to  propose  an  action  to  execute. 

The  road  signs  module  is,  at  the  moment,  in  charge  of  determining  the  value  of  three  parameters: 

•  speed  limitation  (real  value), 

•  overtaking  (YES  1  NO), 

•  crossroads  priority  (right  of  way  |  priority  to  the  right  way  |  stop  |  traffic  lights), 
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Driver 


The  Hintrary  module  is  in  charge  of  determining  the  new  direction  at  each  crossroads,  so  this  module 
is  only  activated  when  the  vehicle  is  near  one  of  them.  The  obsiacle  dctecUon  module  h^  to  determine 
if  there  is  some  possible  intersections  between  the  vehicle  desired  trajectory  and 
of  other  vehicles  and  if  so  to  propose  a  new  trajectory.  Actions  managed  by  the  state  feedback 

module  are: 

for  the  guidance  torque: 

•  followJrajeciory(aciual  posHion,  desired  irajeciory,  circle  radius), 
and  for  the  motor  torque: 

•  accelerate(desired  speed), 

•  hrakeQ, 

•  stop  (distance), 

m  cover(disiance,  delay), 

•  follow  (preceding  vehicle  actual  speed,  distance). 

In  order  to  simplify  calculation,  the  human  vision  is  not  completely  simulated^  but  is  ^ 

global  knowledge  on  the  scene  geometry  and  on  the  location  of  objects,  then  by  using  visual  sensor 
obtain  qualitative  informations  about  objects  in  the  vision  cone. 

6  A  first  Prototype  of  Real-Time  Driving  Simulation 

A  first  implementation  of  this  system  has  been  realised  as  part  of  the  car  driving  example.  It  a  rnodular 
tsdLZlose  synchronization  and  control  are  specified  in  the  synchronous  rea^l-tiine  language  SIGNAL 
[5],  Data  communication  between  agents  is  realized  by  using  the  notion  of  Blackboard  (common 

“Icribe  now  „  ..»ple  of  o  li.fle  to.„  co™p,^d  of  some  “ 

ment  buildings,  separated  houses,  church),  a  eight  shaped  road,  a  crossroads  with  traffic  lights 
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ground  vehicles  (cf  figure  8).  There  is  three  kinds  of  vehicles:  one  of  them  is  driven  by  a  user  (by  using  a 
mouse)  and  corresponds  to  the  first  vehicle  of  the  platoon;  three  of  them  represent  other  vehicles  of  the 
platoon  and  the  three  last  one  describe  the  dynamic  environment. 


LOO:  List  of  Objects  MT:  Motor  Torque  GT:  Guidance  Torque  DOF:  Deorees  of  Freedom 
Figure  8:  Driving  example  composition. 

Internal  frequencies  of  these  modules  are  relatively  different:  the  vehicle  module  has  an  internal 
frequency  that  can  not  be  less  than  50  Hz  because  of  numerical  convergence  while  sensors  have  an 
internal  frequency  of  10  Hz  and  the  visualization  frequency  must  be  ideally  of  25  Hz.  The  traffic  lights 
module  does  not  require  a  higher  frequency  than  1  Hz.  It  is  necessary  to  synchronize  the  execution  of  all 
these  modules  to  offer  to  real  and  virtual  drivers  a  realistic  world. 

The  figure  8  show  synchronous  dataflow  communications  between  modules  of  the  example.  There  is 
also  some  asynchronous  event  based  communications  between  the  mouse  module  and  the  others: 

Mouse  ^  Traffic  Lights:  A  reiniiialization()  message  can  be  sent  at  anytime,  according  to  the  user 
decision. 

Mouse  =>  Visualization:  Different  kinds  of  message  are  sent  to  the  visualization  module  to  control  the 
point  of  view.  For  example,  a  message  sei.view (front  \  behind  \  right  \  left  |  upon  \  driver)  changes 
the  viewpoint  whereas  a  message  sei.vehiclefl  \  2  \  ...  \  7)  changes  the  view  reference  point. 

Mouse  Global  Controller:  A  message  terminaiionO  is  sent  to  the  global  controller  when  the  user 
decides  to  terminate  the  simulation. 


7  Futur  Works 

In  the  actual  version  of  our  system,  the  interaction  of  more  than  one  user  with  the  simulated  environment 
is  not  possible.  We  are  actually  working  on  the  generalization  of  this  model  to  deal  with  communicating 
processes.  Our  goal  is  to  permit  the  platform  execution  whatever  the  hardware  configuration  may  be, 
and  to  offer  a  high  quality  multiuser  interface.  In  addition,  a  general  reflection  is  done  to  characterize  a 
language  which  offer  the  ability  to  specify  both  agents  and  their  different  kinds  of  relations. 

8  Conclusion 

We  have  presented  in  this  paper  a  platform  which  permit  to  simulate  a  line  of  autonomous  vehicles  in 
an  urban  environment.  The  integration  of  descriptive,  generative  and  behaviorals  models  in  the  same 
simulation  platform  offer  to  each  dynamic  entity  a  more  realistic  and  richer  environment,  and  thereby 
increase  possible  interactions  between  an  agent  and  its  environment.  This  work  is  directly  applied  as 
part  of  the  PRAXITELE  project.  Our  work  will  permit  to  first  simulate  the  line  of  vehicles  evolving  in  a 
virtual  environment  with  the  intention  of  giving  experimenters  the  ability  to  test  their  control  algorithms 
whose  inputs  are  informations  from  virtual  sensors. 


Figure  9:  One  example  of  obtained  simulation:  the  traffic  light  being  red  on  his  way.  the  user 
stop  the  first  car  of  the  platoon,  then  the  three  others  decelerate  also,  while  an  automatically 
is  passing  through,  the  crossroad. 


decides  to 
driven  car 
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ABSTRACT 

The  goal  of  this  paper  is  to  describe  a  virtual 
reality  application  which  requires  a  mobile  instal¬ 
lation,  and  the  experience  gathered  during  the 
preparation,  realization,  and  execution  of  this  pro¬ 
ject. 

Recently,  high-end  virtual  reality  (VR)  has 
received  much  attention  in  the  marketing  field. 
Professional  advertisment  agencies  are  trying  to 
exploit  the  fascination  in  this  technology  raised 
within  the  public.  IGD  was  approached  by  such  an 
agency  and  requested  to  prepare  a  high-quality  VR 
application,  serving  as  a  marketing  event  for  the 
Schweizerische  Bankgesellschaft  /  Union  de 
Banques  Suisses  (SBGAJBS),  the  largest  Swiss 
bank.  SBG  wanted  to  attract  the  Swiss  youth  in 
order  to  promote  its  Junior  Bank  Card, 

This  event  was  held  in  twelve  major  Swiss 
cities  during  four  weeks  in  May/June,  1994,  with 
daily  presentations.  The  time  schedule  and  the 
marketing  concept  required  a  mobile  installation. 
As  "Cyberspace  Roadshow"  the  event  was  adver¬ 
tised  in  newspapers  and  radio  spots.  This  was 
worldwide  the  first  mobile,  immersive,  high" 
quality  VR  installation.  The  complete  VR  infe- 
structure  was  installed  in  a  large  truck.  The 
presentations  were  performed  inside  the  truck  as 
well.  The.  hardware  infrastructure  was  comprised 
of  an  image  generator  (SGI 
Crimson/RealityEngine),  VR  peripherals  (head- 
mounted  display,  data  glove,  body  tracking 
systems),  a  sound  generator  (SGI  Indigo),  and  a 
large  ,  stereoscopic  rear  projection  screen  using 
polarizing  light.  Within  such  an  immersive 
environment,  approximately  40  persons  can  be 
involved.  Wearing  3D  glasses  the  audience  can 
experience  stereoscopic  viewing  and  follow, 
passively,  what  one  active  person  is  controlling  by 
utilizing  the  VR  devices. 

For  this  event  IGD  created  several  entertain¬ 
ing  virtual  worlds  using  its  proprietary  VR  system. 
The  main  idea  was  that  one  active  player  is  flying 
through  a  tunnel,  approaching  a  switch  room  with 


three  alternatives:  riding  through  a  jungle,  playing 
musical  instruments  or  exploring  a  space  labyrinth. 
Each  player  had  a  time  limit  of  approximately  5 
minutes. 


Keywords:  virtual  reality,  roadshow,  case  study 


TNTROmiCTION 

Almost  continuously  during  the  last  couple  of 
years  since  the  foundation  of  IGD's  virtual  reality 
demonstration  center  a  variety  of  VR  presentations 
have  been  realized.  The  presentauons  cover 
demonstrations  at  scientific  conferences,  in-house 
consulting,  museum  exhibitions  as  well  as  fairs 
and  commercial  exhibitions.  The  main  goal  of  all 
performed  contract-based  VR  presentations  was  to 
attract  an  audience  by  featuring  VR  technology. 
This  shows  the  potential  of  VR  as  a  marketing 
instrument  in  business  management  (Felger  and 
Waehlert  1995). 

In  late  1993  the  advertising  agency  Bosch  & 
Butz  approached  IGD  with  the  concept  idea  of  a 
special  marketing  event  for  the  Schweizerische 
Bankgesellschaft  (SBG),  the  largest  Swiss  banking 
institute.  In  order  to  address  its  young  clients,  the 
Swiss  youth,  the  SBG  has  currently  a  marketing 
concept  which  concentrates  on  the  realization  of  a 
few  large  and  unique  events  per  year  rather  than 
numerous  small  ones.  The  motto  for  the  year  1994 
was  "the  year  of  adventures"  and  was  to  provide 
exciting  and  thrilling  experiences.  The  previous 
event  was  the  Huropean  premiere  of  the  Disney 
movie  "Aladdin"  in  an  outdoor  cinema-setup  on 
top  of  a  mountain  in  the  Swiss  Alps  with  a  screen 
carved  into  a  glacier.  The  basic  idea  for  the 
event  was  to  exploit  the  fascination  of  VR  tech¬ 
nology  for  marketing  and  promoting  the  SBG.  In 
several  Swiss  cities  people  were  to  be  able  to 
experience  VR  for  themselves.  The  realization  of 
a  roadshow  was  the  ultimate  goal.  Although  &e 
initial  enthusiasm  for  VR  has  diminished,  media 
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and  the  general  public  are  still  excited,  open  and 
eager  for  a  new  experience  (Cohen  1994). 

This  paper  represents  a  case  study  and  is 
organized  according  to  the  following  structure: 
First,  preparation  activities  und  basic  thoughts  are 
presented-  Afterwards,  realization  aspects  are  dis¬ 
cussed,  showing  the  problems  of  the  project. 
Later,  experiences  in  the  execution  phase  of  the 
project  gained  during  the  pubHc  presentadons  are 
summarized.  The  paper  closes  with  an  outlook  on 
future  work  and  the  conclusions. 


PREPARATION  PHASE 

The  project  started  in  late  February,  1994, 
and  should  have  been  on  the  road  around  mid 
May,  1994.  This  gave  us  less  than  three  months 
for  the  project  preparadon  and  realization. 

After  inidal  negodadons  about  costs  and  con- 
sideradons  about  the  feasibility  of  the  planned 
event  we  decided  to  take  the  chance  to  explore 
new  terrain.  Our  partners  at  the  bank  and  the 
adverdsing  agency  neither  posed  any  restricdons 
on  us  nor  enforced  or  suggested  specific  solutions 
(only  the  glove  was  a  must,  because  it  was  the 
central  component  of  the  adverdsement  campaign, 
symbolizing  VR).  We  "only"  had  to  realize 
something  excidng  and  thrilling.  This  left  us  with 
complete  freedom  for  our  work,  but  also  caused 
some  troubles  and  a  lot  of  headaches. 

We  started  to  tackle  the  project  with  a  team 
of  computer  graphics  and  design  people  (supported 
by  students,  of  course).  Although  we  only  had  few 
weeks  left  for  designing  and  modeling  the  virtual 
world  and  specifying  and  integradng  new  features 
in  our  VR  system,  we  were  confident  about  the 
realizadon  of  the  event.  Soon  we  decided  on  the 
kind  of  the  overall  system  interacdon:  exploring  an 
adventure  world.  Shoot  and  quick  reacdon 
scenarios  were  abandoned  because  of  ethical  and 
technological  (body  tracking  latency)  reasons. 
Think  and  learn  contents  are  not  explicitly  thrilling 
for  everyone. 

We  had  to  find  soludons  for  two  basic 
challenge  categories: 

1.  Logisdc  and  technology  challenges:  For  the 
purpose  of  this  markedng  event  a  mobile 
installation  had  to  be  planned  which  could  tour 
some  ten  major  Swiss  cities  during  a  month's 
time.  Adequate  locations  had  to  be  found  and 
prepared  (e.g.,  administration  clearance,  power 
supply,  local  support  people,  catering  for 
guests).  The  instailadon  had  to  be  robust 
enough  to  run  a  continuous  six  hour  show  per 
day  and  endure  the  intermediate  transport. 
Complicating  the  organization,  three  show- 
masters  had  to  be  trained  because  there  are 


three  different  languages  regions  in  Switzer¬ 
land  (French,  German,  and  Itahan).  Further¬ 
more,  customs  and  insurance  issues  had  to  be 
solved. 

2.  System  challenges:  The  experience  had  to  be 
tailored  for  use  by  anyone  with  any  education 
and  any  experience  around  the  age  of  17  (i.e., 
SBG’s  Junior  Bank  Card  holders).  The  major 
demands  for  the  virtual  world  and  the  system 
were: 

-  Continuous  motion  in  order  to  create  inter¬ 
esting  worlds.  No  pure  walkthrough  or 
flythrough. 

-  Provide  an  atmosphere  others  than  the  pure 
geometric  architecmre  of  a  world. 

Dynamic  worlds  rather  than  static  worlds. 

Easy,  intuitive  interaction  which  doesn't 
requires  training  or  longer  introduction 
(self-explaining  content). 

Exciting  and  thrilling  content  (for  users  of 
this  age/generadon). 

-  Participation  of  a  larger  audience  (more 
than  20  people). 

-  Quick  turnaround  times  for  players. 

All  these  aspects  had  been  taken  into  consid¬ 
eration  in  order  to  realize  an  attractive,  mobile, 
immersive,  high-quality  VR  installation.  For 
further  reading  (Blinn  1994,  Dods worth  1994, 
Giles  et  al.  1994,  and  Helman  1994)  are  recom¬ 
mended. 


REALIZATION  PHASE 

This  chapter  covers  the  realization  aspects  for 
the  roadshow,  dealing  with  hardware  components, 
interaction  issues,  and  the  overall  story  of  the 
presentation.  The  VR  system  used  is  IGD's 
Virtual  Design  (Astheimer  et  al.  1993). 

Hardware  Components 

Fulfilling  the  requirements  of  touring  accross 
the  country  and  minimizing  the  time  without 
presentations  lead  to  the  idea  for  a  VR  installation 
inside  a  large  truck  (see  Fig.  1).  This  enables 
times  of  approximately  one  hour  to  start-up  or 
shut-down  the  complete  infrastructure  at  each 
show  location.  It  was  intented  to  run  the  shows  in 
public,  open-air  places  and  in  large  halls.  As  a 
larger  au^ence  should  be  able  to  watch  the  active 
players  and  their  behavior  within  the  virtual 
worlds  the  truck  was  open  at  one  side  during  show 
times.  To  avoid  the  brightness  of  sun  light,  the 
open  side  could  be  covered  by  a  tent-like  curtain. 

The  basic  VR  equipment  for  a  player, 
including  a  powerful  graphics  workstation  and 
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peripherals  (head-mounted  display,  navigation 
device,  tracking  system),  is  very  expensive.  For 
this  reason  we  decided  to  install  one  high-quality 
VR  station  for  the  active  player  and  a  stereoscopic 
large-screen  rear  projection,  utilizing  polarizing 
light,  for  passive  observation  by  the  audience  (see 
pfg.  2).  Figure  3  shows  the  VR  infrastructure  of 
the  installation.  It  is  comprised  of  an  image  gener¬ 
ator  (SGI  Crimson/RealityEnginel  with  Multi- 
Channel  Option),  VR  peripherals  (Virtual 
Research  FlightHeimet,  VPL  DataGlove, 
Polhemus  Fastrak),  a  sound  generator^  (SGI 
Indigo),  and  stereoscopic  rear  projection 
(NEC/TAN).  Within  such  an  immersive  environ¬ 
ment,  approximately  40  persons  can  be  involved. 
Wearing  3D  glasses  the  audience  can  experience 
stereoscopic  viewing  and  follow,  passively,  what 
one  active  person  is  controlling  by  utilizing  the  VR 
devices. 


Fig.  1:  Roadshow  truck  with  VR  equipment 


Fig.  2:  Presentation  stage  with  action  point  Oeft) 
and  stereoscopic  rear  projection  (right) 

The  Multi-Channel  Option  delivers  a  high- 
resolution  video  signal  (1280  x  1024  pixel)  to 
drive  the  stereoscopic  projection  in  highest  quality. 
With  a  scan  converter  one  signal  is  transformed 
into  a  standard  low  resolution  bJTSC  signal  to  feed 
the  HMD,  To  reduce  the  image  generation 
complexity  we  decided  to  run  the  HMD  in  mono- 
scopic  mode,  because  former  presentations 


prooved  that  due  to  the  poor  resolution  of  the 
HMD  used,  the  appealing  thing  is  its  wide  field  of 
view  and  not  the  stereoscopic  viewing.  This  will 
certainly  change  when  HMD's  with  higher  resolu¬ 
tion  are  available/affordable. 

Being  aware  of  the  fragile  VR  peripherals  in 
use,  a  replacment  for  the  HMD  and  the  tracking 
system  was  always  on  board.  Keeping  our  experi¬ 
ence  with  VPL's  fibre  optic  glove  for  public 
presentations  in  mind,  it  was  used  purly  as  a  plat¬ 
form  to  mount  the  hand  tracker  (Felger  1992).  No 
gesture  interfacing  was  applied  and  it  was 
expected  that  the  glove  would  not  survive  the 
entire  event.  It  was  really  only  included  because  it 
was  the  central  component  in  the  nation  wide 
advertizing  campaign. 


speaker  stereoscopic  speaker 
large  screen  projection 


Fig.  3:  VR  infrastructure 
Interaction  With  The  Virtual  World 

Various  actions  should  be  available  for  the 
players  when  they  experience  a  vhmal  world. 
This  considers  interactions  with  objects  repre¬ 
senting  the  virtual  world  as  well  as  navigation 
within  the  world. 

Static  worlds,  observed  in  typical  walk¬ 
through  applications,  can  get  boring  very  soon.  To 
create  a  vivid  environment,  dynamic  object 
behavior  has  to  be  integrated  in  a  vutual  world. 
We  introduced  animations  for  lights,  camera  and 
any  world  object.  Although  these  animations  have 
to  be  precalculated  (mostly)  and  thus  the  objects 
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do  not  react  in  an  intelligent  manner  to  the  players' 
actions  this  makes  the  worlds  much  more  inter¬ 
esting.  Animations  can  be  triggered  or  combined 
with  user  interactions  or  run  continuously  for  as 
long  as  the  player  is  within  the  virtual  world.  With 
these  features  the  resulting  presentation  becomes  a 
mixture  of  preprocessed  animation  and  high-inter¬ 
active  virtual  reality,  both  running  in  harmony 
under  realtime  requirements. 

Navigating  in  virtual  worlds  requires  in  this 
special  setup  some  new  approaches.  As  the 
DataGlove  had  to  be  integrated  any  way  possible 
because  it  symbolizes  VR  technology  for  the  lay 
audience,  we  have  to  work  around  its  inherent 
deficiencies.  The  DataGlove  may  cause  problems 
because  it  has  to  be  calibrated  to  each  player's 
hand,  and  the  applied  fiber  optic  technology  is  not 
robust  enough  for  permanent  public  use.  Hygiene 
still  is  another  issue.  Another  lesson  learned  from 
previous  events  and  also  strongly  recommended  by 
Prof.  Henry  Fuchs  from  UNC  during  his  talk  at 
IGD  in  April  1993  is:  Don't  fly  !!! 

People  have  problems  with  standard  naviga¬ 
tion  mechanisms  based  on  the  DataGlove  (point  & 
fly).  Thus  we  realized  a  carpet  metaphor  for  navi¬ 
gation,  where  the  players  can  move  freely  with 
respect  to  the  constraints  of  cable  length  and 
tracker  range  (i.e.,  physical  borders  of  the  action 
point  limit  the  movements)  to  perform  small-scale 
navigation.  To  travel  longer  distances  they  are 
transported  or  beamed  in  a  predefined  manner. 
This  is  similar  to  the  Aladdin  realization  at 
EPCOT  done  by  Disney  Imagineering  except  that 
we  do  not  use  a  real  carpet  and  do  not  have  the 
capability  to  control  the  carpet. 

The  players  trigger  the  movement  over  long 
distances  in  general  by  touching  an  appropriate 
object  in  the  virtual  world.  This  object  represents 
the  movement  behavior  (e.g.,  moving  along  a 
specific  path  for  a  certain  time).  Moreover,  such  a 
basic  interaction  mechanism  is  generic  and  can  be 
applied  and  configured  for  different  virtual  worlds 
(no  explicit  programming  necessary).  This  is 
realized  as  a  list  of  touchable  (triggerable)  objects 
with  associated  actions.  Initial  actions  can  be 
started  when  entering  or  switching  to  a  new  virtual 
world.  The  following  actions  can  be  distinguished 
(combinations  are  possible  too): 

1)  Animation  of  objects  and  lights  in  terms  of 
transformations.  Animations  can  loop  end¬ 
lessly  or  according  to  a  given  repetition 
counter. 

2)  Animation  of  the  camera  (view  of  the 
observer)  in  terms  of  transformations.  Various 
modes  are  possible: 

-  Additional  head  movement  is  allowed:  yes 
or  no. 

-  Interpolation  from  a  current  camera  view  to 
a  given  starting  view  is  enabled:  yes  or  no. 


-  Accelerate  or  slow  down  the  camera 
movement. 

-  Camera  animations  can  loop  endlessly  or 
according  to  a  given  repetition  counter. 

3)  Trigger  a  sound  event. 

4)  Switch  between  two  virtual  worlds. 

5)  Switch  the  visibility  of  objects  between  visible 
and  invisible. 

6)  Animate  an  object  in  terms  of  changing  its 
geometric  representation.  An  alternating 
object  sequence  can  loop  endlessly  or  accord¬ 
ing  to  a  given  repetition  counter. 

Presentation  Storv 

The  idea  of  the  story  was  to  generate  an 
eventful  and  complex  adventure  world  consisting 
of  several  subworlds  which  were  connected  by  a 
simple  transition  structure.  Furthermore,  with 
respect  to  the  expected  players  and  audience,  the 
Junior  Bank  Card  was  to  play  a  major  role  in  the 
story.  With  the  integration  of  special  effects  (see 
below)  a  certain  atmosphere  or  mood  is  created  for 
each  world.  Due  to  time  pressure,  only  a  few  ideas 
could  be  realized  (e.g.,  virtual  rain  was  skipped). 
In  general,  sound  helps  a  lot  to  create  a  mood  and 
is  used  throughout  the  adventure  world  (Astheimer 
1993). 

At  the  beginning,  the  players  were  flying, 
helicopter-like  around  the  truck  (see  Fig.  4)  until 
they  were  transported  towards  the  action  point 
inside  the  truck.  This  symbolizes  the  shift  from 
the  real  world  into  the  virtual  real  world  and  repre¬ 
sents  the  start  of  the  virtual  journey. 


Fig,  4:  Helicopter  flight  around  the  virtual  truck 

After  arriving  at  the  action  point  the  players 
faced  a  contomat  (SBG's  teller  machine,  see  Fig. 
5).  Up  to  this  point  they  were  completely  guided 
by  the  system,  which  now  changes  to  interactive 
mode.  By  touching  the  contomat  with  the  hand  the 
players  triggered  the  transition  from  the  virtual 
real  world  (truck  with  environment)  to  the  virtual 
imaginative  worlds. 


I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 


Astheimer,  P.;  Felger,  W.:  AN  INTERACTIVE  VIRTUAL  WORLD  EXPERIENCE ... 


Page  4  of  8 


Fig.  5:  Contomat  (teller  machine) 


The  players  were  then  sucked  into  the  card 
feeder  and  found  themselves  flying  through  a 
tunnel  (see  Fig.  6).  For  a  certmn  time  they  pro¬ 
ceeded  through  the  tunnel  with  good  (blue  sky 
texture)  and  bad  (yellow/red  texture)  objects 
(cubes)  approaching  them.  They  should  have  tned 
to  touch  the  good  objects  and  avoid  contact  with 
the  bad  ones.  Appropriate  sound  signals  were 
oenerated  in  both  cases.  The  bank  card  was 
Ilwavs  flying  ahead,  guiding  the  players  through 
the  tunnel  and  releasing  particles,  but  never  reach¬ 


able. 


Fig.  6:  Tunnel  flight 

Next,  the  players  approached  a  switch  room 
(see  Fig.  7)  where  they  selected  one  out  of  three 
alternative  worlds  in  order  to  proceed  with  the 
journey.  The  bank  card  dissolved  mto  three  p^ 
which  disappeared  into  these  teee  worlds,  '^e 
choice  between  the  corresponding  virtiid  wmlds 
was  symboUzed  by  color  and  a  typical  object  from 
each  world  (a  palm  tree  indicated  the  jungle  world, 
trumpets  indicated  the  stage  world,  a  planet  indi¬ 
cated  the  space  world).  The  players  made  meir 
selection  to  enter  one  world  simply  by  touching 
one  of  the  big  arrows  just  beneath  them.  We 
called  this  switch  room  the  "tower  of  fate  because 
its  design  was  inspired  by  the  star-wars  scene 
where  the  Jedi  master  fought  Darth  Vader  on  a 


narrow  bridge  overlooking  a  precipice.  Here, 
spaceship-like  flashing  control  lights  were  dis¬ 
played. 


Fig.  7:  Switch  room  (tower  of  fate) 

In  the  jungle  world  (see  Fig.  8)  an  elephant 
ride  through  a  jungle  was  simulated.  The  players 
could  touch  animafs  which  lived  in  this  jungle  By 
doing  so  they  were  collecting  parts  of  the  bank 
card,  whose  fragments  were  attached  to  the  back 
of  the  player's  hand.  Each  successful  hit  was 
signaled  by  a  sound  event. 


Fig.  8:  Jungle 

The  stage  world  (see  Fig.  9)  provided  a 
scenario  where  the  players  flew  across  a  ficUtious 
audience  onto  a  stage,  where  they  were  sraounded 
by  fantastic  instruments  (e.g.,  keyboard,  guitar, 
chorus).  Furthermore,  a  light  show  was  performed 
on  the  stage.  By  touching  the  instruments,  a  short 
melody  fragment  from  a  song  was  played  and  the 
instruments  moved  as  long  as  the  music  played. 
Keeping  the  instruments  randomly  playmg 
rew^ded  the  players  with  parts  of  the  bmk  card. 
(Our  original  idea  was  to  force  the  players  to 
remember  and  replay  a  given  melody  sequence. 
But  this  was  considered  too  restrictive  to  be  solved 
by  everyone.) 
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The  space  world  was  placed  in  a  universe 
with  moving  asteroids  and  planets.  It  was 
designed  as  a  labyrinth  with  five  nodes  and  multi¬ 
ple  connections  (e.g.,  tube,  staircase)  between 
these  nodes  (see  Fig.  10).  Each  node  consisted  of 
four  exits.  Touching  one  exit  meant  selecting  this 
connection  to  the  neighbor  node  and  being  moved 
along  its  path.  In  some  nodes  parts  of  the  bank 
card  were  placed,  where  they  could  be  collected. 


Fig.  9:  Stage 


Fig.  10:  Space  labyrinth 


Fig.  11:  Firework 


Finally,  after  a  certain  time  the  entire  experi¬ 
ence  ended  with  a  colorful  and  loud  fireworks 
display  (see  Fig.  11), 


EXECUTION  PHASE 
Show  Procedure 

The  roadshow  toured  across  Switzerland 
from  May  16  until  June  15,  1994,  with  stops  in 
Zurich  (May  16-18),  Winterthur  (May  19),  Aarau 
(May  20-21),  Geneva  (May  24-26),  Lausanne 
(May  27-29),  Sierre  (May  30),  Bern  (May  31  - 
June  2),  Zug  (June  3-5),  St.  Gallen  (June  6-7), 
Chur  (June  8-9),  Lugano  (June  10-12),  and  Lucem 
(June  13-15).  In  each  city  a  local  group  of  volun¬ 
teers  from  the  SBG  supponed  the  event.  At  some 
locations  the  show  was  held  inside  a  festival  or 
exhibition  hail,  at  others  directly  in  front  of  the 
SBG  building  or  in  a  public  place.  The  entrance 
regulation  was  at  the  discretion  of  the  local  group. 
Some  charged  bank  card  holders  5  SFR  and  non- 
holders  15  SFR,  others  gave  card  holders  free 
entrance.  Mostly  one  free  drink  was  included. 

A  typical  presentation  day  featured  50  minute 
shows  with  a  10  minute  break  between  two  shows, 
running  from  4  pm  until  10  pm.  Mostly,  IGD  staff 
gave  a  very  brief  introduction  to  VR  technology 
and  its  applications  at  each  show  start  (Encamacao 
et  al.  1994).  A  show  master  was  guiding  the  play¬ 
ers  and  the  audience  with  respect  to  multi-lingual 
Switzerland,  either  in  French,  German,  or  Italian. 
Performing  one  VR  experience  together  with 
interviewing  the  player  took  approximately  5 
minutes  time.  Most  of  the  time  an  audience  of 
between  10  and  50  persons  attended  one  show. 
We  had  days  with  a  total  of  roughly  400  persons 
sharing  the  presentations.  Players  were  selected 
by  the  show  master  or  by  means  of  a  lottery.  The 
players  started  with  the  sequence  of  the  virtual 
truck,  and  the  approach  to  the  contomat;  this  was 
followed  by  the  tunnel  flight,  the  switch  room,  and 
either  the  space  labyrinth,  the  stage,  or  the  jungle 
ride;  it  concluded  with  the  fireworks.  Each  player 
was  rewarded  with  a  unique  "Cyberspace  Road¬ 
show  T-Shirt". 

Media  Coverage 

The  roadshow  was  promoted  by  a  nation¬ 
wide  advertizing  campaign.  This  included  posters, 
radio  spots  by  local  radio  stations,  and  a  preview 
in  SBG’s  newsletter  "megascene".  Furthermore, 
two  press  conferences  were  held,  one  at  the  tour 
start  in  Zurich,  and  the  other  one  later  in  Geneva. 

The  response  in  the  media  was  enormous  and 
very  positive.  Already,  during  the  first  week  the 
roadshow  had  coverage  in  most  major  Swiss 
newspapers  and  the  public  TV  station.  This 
continued  steadily  over  the  full  runtime  of  the 
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event.  In  total,  approximately  100  press  articles 
can  be  counted,  as  well  as  four  TV  spots  (even  one 
from  an  Austrian  TV  station),  and  five  radio  inter¬ 
views. 

Observations 

After  watching  the  large-screen  stereoscopic 
projection  in  highest  resolution,  many  players 
were  disappointed  about  the  poor  resolution  of  the 
HMD  (90k  primary  pixels).  Some  complained 
about  the  integrated  system  guidance  or  expressed 
the  feeling  that  they  were  not  really  mastering  the 
world  (sometimes  the  operator  "helped"  to  interact 
via  keyboard  function  keys).  Another  complaint 
was  about  the  jaggy  motion  due  to  low  frame  rates 
(approximately  5-10  frames/sec.  depending  on  the 
world  complexity)  and  tracker  faults,  which 
resulted  in  annoying  pauses. 

As  in  numerous  earlier  presentations,  it  was 
very  hard  to  explain  that  all  computations  and 
computer  graphics  were  done  online  under  real¬ 
time  requirements  and  no  prerecorded  tape  or 
video  is  involved.  On  the  other  hand  many  people 
were  very  interested  in  this,  starting  some  PC-level 
discussions  to  figure  out  the  enormous  computa¬ 
tional  work  load. 

We  got  all  types  of  persons  (e.g.,  from  kids  in 
kindergarten  age  to  retired  adults,  male/female, 

computer-educated/non-computer-educated). 

Most  people  did  Uke  the  experience,  a  few  were 
enthusiastic,  a  few  did  not  like  it  at  all.  Groups 
cheered  on  the  players.  Some  people  were  very 
passive  and  showed  a  TV-like  stiff  behavior,  some 
were  very  active  (too  active  in  fact  for  ^e  tracking 
system).  Very  tall  or  small  people  (kids)  did  not 
match  our  predefined  settings  and  they  sometimes 
had  extraordinary  perspectives.  The  most  popular 
world  was  the  space  labyrinth,  where  suifmg  the 
staircase  up  or  down  was  especially  appealing. 

T.essons  Learned 

A  roadshow  and  its  show  locations  have  to  be 
planned  very  carefully.  Luckily,  we  did  not  have 
to  face  unsolvable  problems  but  it  seems  to  be 
wise  to  check  the  locations  well  in  advance  by  the 
truck  driver.  Although  the  enormous  size  of  the 
truck  was  known  to  everybody,  sometimes  its  deli¬ 
cate  maneuverability  was  not  considered  adequate. 

If  you  want  many  people  coming  to  see  the 
show,  it  is  very  important  to  place  it  somewhere  in 
a  central  open  air  place,  ra&er  than  at  a  remote 
location.  For  example,  in  Geneva  the  show  was 
held  at  the  fairgrounds  (Palexpo)  at  the  city 
periphery  and  was  rather  sparsely  attended.  At 
Lugano  the  truck  stopped  in  the  main  square  near 
the  lake  front  (Piazza  Riforma)  and  it  was  almost 
impossible  to  handle  the  crowds  waiting  in  the  sur¬ 
rounding  street  cafes. 


You  should  expect  a  great  variety  of  people 
joining  the  show.  Even  when  the  target  audience 
seems  to  be  of  a  certain  age  (Junior  Bank  Card 
holder's  age  is  less  than  20),  be  prepared  for 
everybody.  Make  a  comprehensive  d^  run  of 
your  software  and  the  created  worlds  with  such  a 
large  variety  of  people,  not  only  the  ones  available 
in  your  lab.  If  possible,  be  flexible  enough  to 
adjust  your  software  or  world  during  the  show. 

You  should  set  aside  enough  time  for  the 
design  and  creation  of  your  virtual  worlds.  Spend 
at  least  half  of  the  development  effort  on  this  and 
you  will  get  creative  worlds.  Less  than  three 
month  for  the  preparation  and  reahzation  of  such  a 
project  was  extremly  short  and  could  only  be 
tackled  by  a  highly  integrated,  cooperative  team. 

Our  biggest  surprise  was  not  having  any 
major  problems  with  the  VR  infrastructure.  The 
complete  equipment  ran  without  failure  during  the 
roadshow.  There  was  not  one  show  which  had  to 
be  delayed  or  cancelled.  But  it  will  be  reassuring 
to  have  redundant  equipment  with  you.  Running 
the  hardware  over  months  and  years  in  our  lab  tells 
us  that  you  can  not  expect  such  a  nice  behavior. 

Although  HMD  and  DataGlove  are  the 
classical  VR  symbols,  think  twice  about  whether 
the  intended  presentation  will  really  need  it.  For 
this  roadshow  it  was  a  requirement  and  we  dealed 
with  it  by  implementing  dedicated  interaction 
techniques  which  worked  fine.  If  you  have  the 
freedom  to  choose  the  equipment,  wait  for  really 
robust  and  comfortable  HMD's  and  gloves,  or 
avoid  thek  use. 


FTJTTJRE  WORK 

During  the  roadshow  and  the  weeks  after  the 
roadshow  IGD  received  many  inquiries  about  this 
Cyberspace  Roadshow.  Many  had  the  same  goal, 
to  use  a  mobile  VR  system  for  marketing 
purposes.  With  a  few  companies  which  can  afford 
such  an  installation,  fu^er  negotiations  are 
ongoing.  You  can  expect  one  future  event  in  mid 
1995  in  Germany.  Confidentiality  reasons  prohibit 
a  disclosure  at  this  time. 

As  IGD  is  a  research  institute,  we  are  always 
looking  for  new  challenges  and  do  not  want  to 
duplicate  finished  projects.  We  are  improving  our 
VR  system  with  respect  to  requirements  for  enter¬ 
tainment  and  leisure  applications.  The  integration 
of  a  personal  motion  platform  lies  within  a 
medium-term  time  scale. 
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CONCLUSIONS 

This  paper  described  the  Cyberspace  Road¬ 
show  which  was  performed  for  the  Schweizerische 
Bankgeseilschaft  (SBG)  in  Switzerland.  The  asso¬ 
ciated  work  in  preparing,  realizing,  and  executing 
this  roadshow  were  described  in  detail  as  a  case 
study. 

For  this  project  the  interaction  concept  of 
IGD's  VR  system  was  redesigned  and  extented. 
Interactions  can  be  specified  in  a  data  file  refer¬ 
encing  world  objects  and  predefined  actions.  Thus 
interesting  living  or  reacting  worlds  can  be  easily 
defined  and  evaluated.  It  has  been  prooved  that  a 
mobile,  high-quality,  immersive  VR  installation  is 
feasable.  Moreover,  recendy  a  mobile  driving 
simulator  has  been  prototyped  (Latham  1994). 

For  the  SBG  this  marketing  event  was  a  big 
success,  although  the  response  was  not  as  big  as 
predicted.  As  usual  for  markedng  events,  SBG  did 
a  questionaire  along  with  the  roadshow,  but  the 
results  are  not  available  to  the  authors.  According 
to  the  SBG,  especially  the  extensive  media  cover¬ 
age  accross  Switzerland  has  strenghten  signifi- 
candy  SBG's  image  of  dealing  with  innovadons 
and  being  known  for  new  ideas  "off  the  beaten 
track".  SBG  spent  about  250,000  SFR  to  enable 
this  event  (NN  1994). 
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Abstract 

Remote  sensing  (RS)  satellites  produce  large  amounts  of  scientific  data.  With  the  oudook  of  many  more  RS 
sensors  to  come  and  the  massive  amount  of  data  related  with  that,  the  need  for  good  visualisanon  tools  is 
stronger  than  ever.  Virtual  Environments  (VEs)  are  promising  tools  to  perform  visual  analysis  of  this  type  of 
data  in  an  intuitive  manner.  A  prototype  system  is  described,  which  employs  VE  technology,  for  the  mteracnve 
visualisation  of  remote  sensing  data.  The  system  faciHtates  the  interactive  visualisanon  of  multiple  toge 
remote  sensing  datasets  from  different  sources,  as  well  as  remote  sensing  datasets  in  relauon^  with  data 
orimnating  from  other  sources,  e.g.,  terrain  data  from  geographic  information  systems.  The  system  is  based  on 
an  existing  visualisation  system  (IRIS  Explorer)  that  has  been  enhanced  with  several  data  management  and 
rendering  functions.  Data  management  functions  include  region-of-interest  selection,  geometric  data 
correlation,  image  data  filtering,  and  geometry  decimation.  The  existing  Explorer  rendering  capabUiaes  have 
been  enhanced  with  motion  parallax  based  on  head-tracking,  texture-mapping  funcDons  to  combme  image  data 
with  creometric  terrain  models,  and  dynamic  adaptation  of  display  resolution.  These  funcuons  allow  the  user  to 
display  significantly  larger  amounts  of  remote  sensing  data  at  real-time  frame  rates,  and  mteractively  control 
the"  trade-off  between  real-time  (rendering  speed)  and  realism  (data  complexity). 


1.  estroduction 

Researchers  in  the  physical  sciences  nowadays  have  to  deal  with  massive  amounts  of  data  originating  from 
both  experiments  and  simulations.  The  problem  of  exploring  these  data  in  search  of  meaningful  results 
recently  been  recognised.  The  National  Science  Foundation  (NSF)  report  "Visualization  in  Scientific 
Computing"  [1]  explicitely  mentions  earth  resource  satellites  as  sources  of  incredible  amounts  of  data. 
Obviously,  the  increasing  number  of  sateUites,  both  for  earth  observation  and  for  exploration  of  other  planets 
as  well  as  the  increasing  data  density  of  these  information  sources  will  r^uire  the  use  of  more  adyan^ 
visualization  techniques  if  scientists  want  to  continue  examining  these  data  in  a  useful  way.  Current  scien^ic 
visualization  systems  still  rely  on  conventional,  window  based  2D  user  interfaces.  The  system  descnb^  here 
aims  at  introducing  user  interface  techniques  from  virtual  environment  (VE)  technolop'  in  scientific 
visualization  applications.  The  combination  of  interactive  stereoscopic  display  with  the  abihty  to  intuiUvely 
change  the  point-of-view,  allows  the  user  to  move  around  his  data  sets  more  easily,  freeing  him  to  concentrate 
on  the  exploration  of  their  contents.  Our  system  is  a  prototype,  designed  to  be  open  towards  future  extei^on 
into  a  more  general  system.  It  incorporates  a  number  of  important  elements  of  virtual  environment  technolop'. 
most  notabaly  stereoscopic  display,  including  head-slaved  motion  parallax.  It  allows  the  mteracuve  explorauon 
of  large  remote  sensing  data  sets  in  an  intuitive  manner. 

The  remainder  of  this  article  is  structured  as  follows.  Section  2  provides  some  background  on  vanous  sou^ 
of  remote  sensing  data  and  current  visualisation  practices.  The  architecture  of  the  prototype  system  is  descnb^ 
in  section  3.  In  section  4  some  preliminary  results  are  shown.  Conclusions  are  drawn  and  suggesnons  for 
future  work  are  made  in  section  5. 
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2.  BACKGROUND 

2.1  Remote  sensing 

In  1991  the  European  Space  Agency  (ESA)  launched  the  ESA  Remote  Sensing  sateilite-1  (ERS-1).  Since  that 
time  a  huge  amount  of  data  has  become  available  to  the  user  community.  ERS-1  has  a  number  of  sensors 
among  which  the  Synthetic  Aperture  Radar  (SAR),  the  wind  scatterometer  (WSC),  the  radio  altimeter  (RA), 
and  the  along  track  scanning  radiometer  (ATSR).  These  sensors  produce  data  of  a  very  different  nature.  Many 
geophysical  parameters  can  be  obtained  from  these  sensors.  To  name  a  few:  sea  surface  temperature  (ATSR), 
sea  s^jrface  winds  (WSC,  RA),  mean  sea  surface  (RA).  In  the  near  future,  satellites  like  the  ERS-2  and 
ENVISAT  will  add  many  new  sensors  to  this  list.  This  will  dramatically  increase  the  amount  of  information 
available  about  our  planet.  It  is  a  challenging  task  for  scientists  to  translate  this  information  to  products  which 
can  be  used  by  end-users  of  Remote  Sensing  data.  Comparing  RS  data  with  either  data  from  other  sources  or 
with  other  RS  data  might  provide  new  insights  in  their  relations.  From  this  knowledge  new  applications  from 
RS  data  can  be  generated.  Virtual  reality  is  a  very  promising  tool  to  perform  this  type  of  data  fusion  in  an 
intuitive  manner.  Especially  with  the  outlook  of  many  more  RS  sensors  to  come  and  ±e  massive  amount  of 
data  related  with  that,  the  need  for  good  visualization  tools  is  stronger  than  ever. 

2.2  Scientific  visualisation 

Over  the  last  years  visualizadon  has  become  an  accepted  tool  for  scientific  researchers.  The  motivation  for 
using  visualization  techniques  is  the  ever  increasing  deluge  of  data  that  researchers  have  to  deal  with.  The  size 
of  these  data  sets  makes  it  impossible  to  explore,  analyze  and  understand  their  contents  unless  they  are 
represented  in  pictorial  form. 

Two  broad  categories  of  visualization  systems  can  be  distinguished.  The  first  consists  of  more-or-less  "closed" 
application  programs.  An  example  from  this  category  is  the  Data  Visualizer  from  Wavefront.  Open  application 
construction  toolkits  form  the  second  category.  They  allow  the  user  to  assemble  a  visualization  application 
from  predefined  modules,  each  of  which  implements  some  basic  function.  By  feeding  the  output  from  one 
module  into  another  (or  into  several  others),  networks  of  function  modules  may  be  created  that  implement  a 
particular  visualization  application.  Visualization  toolkits  from  the  second  category  are  AVS  [2],  and  IRIS 
Explorer  [3].  Recently,  another  interesting  approach  has  been  described  by  Easier  et.  al.  [4].  They  have 
developed  a  spreadsheet-like  tool  specifically  aimed  at  organising,  browsing,  and  manipulating  large  nubers  of 
RS  images.  Images  are  organised  in  a  matrix  of  cells,  and  the  user  can  interactively  specify  formulas  to  derive 
new  images,  using  image  processing,  image  analysis,  and  rendering  primitives,  analogous  to  the  use  of 
traditional  spreadsheets  to  organise  and  manipulate  text-based  data. 

When  the  user  has  facilities  at  his  disposal  for  the  manipulation  of  data  sets  and  can  influence  the  way  in 
which  they  are  depicted,  with  a  response  time  between  his  actions  and  the  display  of  the  resulting  image  that  is 
sufficiently  short,  he  is  effectively  "put  in  the  visualization  loop".  Because  of  the  size  of  the  data  sets  involved 
and  the  large  amounts  of  computing  resources  required  for  this,  systems  for  interactive  visualization  need  to 
offer  ways  in  which  the  user  can  select  different  trade-offs  between  image  detail  and  response  time.  If  such  a 
system  has  been  well  designed  it  allows  the  user  to  interactively  explore  huge  data  sets,  quickly  focus  on  areas 
of  interest  that  may  require  more  detailed  study  and  on  the  whole  analyze  his  data  much  more  effectively  then 
by  "brute  force"  numerical  analysis  methods. 


3,  THE  SIEVE  SYSTEM 

Recently,  TNO-EEL  has  been  working  on  the  development  of  a  prototype  system,  employing  Virtual 
Environment  technology,  for  the  interactive  visualisation  of  remote  sensing  data.  This  project  is  carried  out  for 
ESA  within  the  framework  of  its  Technology  Research  Programme  (TTIP)-  The  aim  of  the  prototype  is  provide 
a  "quick-look"  tool  for  the  remote  sensing  user  community.  For  this  system  the  name  SIEVE  (Scientific 
Interactive  Exploration  with  Virtual  Environments)  has  been  proposed.  The  SIEVE  prototype  facilitates  the 
interactive  visualisation  of  multiple  large  remote  sensing  datasets  from  different  sources,  as  well  as  remote 
sensing  datasets  in  relation  with  data  originating  from  other  sources,  e.g.,  terrain  data  from  geographies 
information  systems.  In  some  sense,  the  SIEVE  system  is  similar  to  the  NASA  Ames  Virtual  Planetary 
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Exploration  CVPE)  system,  although  this  is  specificaUy  aimed  at  the  visualisation  of  geometric  planetary 
terrain  models  of,  e.g..  Mars  [5], 

3.1  System  overview 

The  SIEVE  DTOtotype  is  based  on  IRIS  Explorer,  an  interactive  visualisation  environment,  running  on  Silicon 
Graphics  workstations  under  the  IRIX  operating  system  [3],  IRIS  Explorer  is  a  modular  interacnve 
visualisation  environment  based  on  the  data  flow  paradigm.  All  software  components  of  the  S1E\^ 
are  implemented  as  Explorer  modules.The  linking  structure  between  the  modules  is  pro^d^_  by  the  MS 
Explorer  dataflow  mechanism.  This  setup  supports  a  modular  and  generic  architecture  with  a  high  flexibility 
towards  future  adaptations  or  upgrades. 

The  prototype  contains  a  set  of  functions  that  allow  the  user  to  visualise  a  number  of  data  products  froni  E^- 
1  SPOT  and  JERS-1  both  from  imaging  and  non-imaging  sensors.  All  RS  data  can  be  combmed  with  other 
RS  data  or  with  digital  elevation  models  (DEMs),  e.g.,  from  the  Digital  Land  Mass  System  (DLMS)  daiab^, 
where  appropriate.  The  user  interface  provides  control  over  aU  data  manipulation  and  rendenng  funcuons  The 
user  can  select  (multiple)  datasets  to  be  visualised  and  controls  the  type  of  rendenng  oi^rauons.  Attnbut^ 
added  to  the  data  can  be  selected  for  visualisation  to  improve  classification.  The  position  of  the  observer  in  the 
virtual  database  world  and  his  viewing  direction  are  taken  into  account  in  the  generauon  of  stereoscopic 
images  in  order  to  provide  a  sense  of  "immersion"  of  the  user  in  the  RS  data.  Manipulauon  of  the  viewmg 
parameters  offers  scaling  and  zooming  in  on  details  of  the  data.  The  user  may  mteracnvely  select  the  requi^ 
level  of  detail  of  the  visualised  data  seL  Together  these  functions  allow  the  user  to  concentrate  on  exploring  the 
contents  of  the  data  in  an  intuitive  way.  A  high-level  functional  blockdiag^  of  the  system  sho^3  m^ 
components  (figure  1).  The  large  amount  of  data  invloved  (a  smgle  ERS-1  SAR  image  consists  of  8000  x  8000 
16  bk  pixels)  Ld  the  requirements  for  real-time  display,  especially  when  head-slaved  mouon  parallax  is 
employed,  demand  appropriate  data  reduction  faciUties  at  each  stage  of  the  pipeline.  The  techniques  employed 
are  described  in  more  detail  below. 


Figure  1.  Main  system  components. 


Reading 


Figure  2  below  iUustrates  the  data  reading  faciUties  of  the  SIEVE  prototype.  In  a  given  RS  applicauon,  a  i^r 
ty^y  reads  in  a  number  of  image  data  sets  and  a  digital  elevetation  model  (DEM)  on  which  the  image  data 
should  be  mapped.  In  order  to  keep  the  amount  of  data  read  into  memory  managable,  the  user 
region-of-interest  (in  latitude/longitude  coordinates)  and  a  rate  at  which  each  image  data  set  is  sub-sampled 
The  Warp  module  does  a  geometric  transformation  and  filtered  mterpolation  to  fit  different  images  (po^bly  of 
different  resolutions)  onto  each  other  (image  registration).  After  having  been  read  in,  image  data  and  terram 
OEM's  are  internally  represented  as  a  standard  Explorer  data  type  called  lattice . 

Figure  2.  Data  reading  functions. 


3.3  Mapping 

The  Mapping  function  translates  the  lattices  resulting  from  the  Reading  function  into  other  lattices  or 
geometric  primitives.  For  example:  a  SAR  image  is  a  2-dimensional  amy  of  scalars,  where  the  sc^ 
represent  the  radar  back  scatter.  The  Mapping  function  allows  the  user  to  for  instance  map  these  scalars  onm 
z-values  so  the  radar  back  scatter  shows  up  as  height  in  an  artificial  DEM,  or  the  user  can  map  *e  ^alars  onm 
colour  values  One  particular  mapping  function,  called  DisplaceLat,  is  used  to  map  the  scalar  hmght  values 
S  SS^Sevation  model  to  acmal  Z  coordinates.  The  DisplaceLat  module  tr^sfo^s  a  2-D  latu<te  with 
elevation  data  into  a  3-D  lattice,  whereby  the  user  can  scale  the  elevation.  The  3^  Mti^  conta^g 
geographical  a^t,  long,  height)  coordinates  is  then  passed  to  the  Projector  module.  The  ftojector  module 
Srms  the  (lat,  long,  height)  coordinates  to  a  cartesian  (X,  Y,  Z)  coordinate  system  accordmg  to  one  of  the 
map  projections,  that  can  be  selected  by  the  user. 
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The  output  of  the  Projector  module  is  passed  to  a  module,  that  transforms  the  lattice  into  a  geometry 
representation.  Explorer  provides  a  standard  module  for  this  purpose,  called  LatToGeom,  that  uses  a 
straightforward  trianguiation  algorithm.  This  algorithm  transforms  an  N  x  M  element  DEM  into  a  triangle 
mesh  of  roughly  2  x  N  x  M  elements.  We  have  enhanced  the  functionality  of  the  LatToGeom  module  by 
incorporating  a  decimation  algorithm.  This  algorithm  groups  together  the  coplanar  triangles,  that  resulted 
from  the  trianguiation  process,  to  form  larger  triangles  [6].  This  reduction  in  the  number  of  output  triangles 
significantly  speeds  up  the  rendering  process. 


Figure  3.  Data  mapping  functions. 


3.4  Display 

The  interactive  presentation  of  the  geometry  to  the  user  is  performed  by  the  Display  function.  Given  the 
geometry  from  the  Mapping  function  and  input  from  the  user,  like  viewpoint,  view  direction  and  other  Yiev.-ing 
parameters,  the  Display  function  generates  the  image  belonging  to  these  inputs.  The  standard  Explorer  display 
module,  called  Render,  provides  facilities  for,  e,g,  interactive  manipulation  of  objects  and  lightsources, 
rendering  parameters,  etc. 

When  the  standard  Explorer  functions  are  used  the  ouiput  of  the  mapping  process  is  a  pure  geometry  i.e,  a 
description  in  terms  of  nothing  but  geometric  primitives  of  the  scene  that  will  be  displayed  to  user.  Usually  in 
remote  sensing  applications,  when  data  sets  from  different  sources  are  combined,  they  are  of  widely  differing 
resolutions.  For  instance,  an  ERS-1  SAR  image  has  a  resolution  of  12.5  x  12.5  meter,  while  a  corresponding 
DEM  of  the  same  geographic  area  has  a  resolution  of  the  order  of  100  x  100  meter.  When  both  data  sets  are 
read-in,  transformed  into  lattices,  and  finally  turned  into  geometric  representations  for  rendering,  it  is  the  size 
of  the  high-resolution  data  set  that  determines  the  size  of  the  resulting  geometry,  and  thus  the  size  of  the 
terrain  that  can  be  displayed  at  reasonable  frame-rates.  When  the  hardware  platform  provides  real-time  texture 
mapping  support,  this  can  be  exploited  by  mapping  (part  of)  the  high-resolution  (SAR)  image  as  a  texture  on 
the  geometric  representation  of  the  digital  elevation  model.  Although  Explorer  provides  no  standard 
functionality  for  texture  mapping,  the  underlying  Inventor  software  of  the  Explorer  Render  module  dees 
support  it.  We  have  extended  the  Explorer  Render  module  interface  with  an  input  port  that  accepts  a  lattice 
which  is  to  mapped  onto  the  geometry  that  arrives  at  the  standard  geometry  input  port  The  advantage  of  this 
approach  is  that  the  bottleneck  of  the  rendering  pipeline,  which  in  this  case  turns  out  to  be  the  geometry 
processing,  is  significantly  widened,  allowing  the  display  of  much  larger  areas  from  the  remote  sensing  data 
sets  than  was  previously  possible. 

A  final  enhancement  to  the  Render  module  concerns  the  dynamic  switching  of  display  resolutions.  The 
standard  Explorer  Render  module  provides  a  drawing  mode  called  ”move-lo-res".  As  long  as  the  user  mo^■es 
the  viewpoint,  the  geometry  to  be  rendered  is  subsampled,  resulting  in  much  fewer  triangles  being  rendered  , 
and  thus  much  faster  frame  rates.  As  soon  as  the  viewpoint  remains  statis,  the  whole  scene  is  rendered  at  full 
resolution.  For  our  application,  the  implementation  of  this  mode  was  considered  unsatisfactory,  because  the 
input  geometry  is  subsampled  in  only  one  direction.  The  result  is  that  during  movement,  the  terrain  being 
rendered  is  shown  as  a  collection  of  parallel  strips.  We  have  achieved  resolution  reduction,  of  the  scene,  by 
adding  an  extra  input  geometry  port  to  the  Render  module,  through  which  a  uniformly  subsampled  version  of 
the  high-resolution  geometry  is  inpuL  Depending  on  whether  the  viewpoint  is  being  moved  or  remains  static, 
the  Render  module  now  chooses  the  appropriate  input  port  (}o-tcs  or  hi-res).  An  additional  benefit  if  this 
approach  is  that  the  resolution  of  the  lo-res  geometry  is  now  under  user  control. 


Figure  4.  Display  functions. 

To  provide  a  researcher  with  improved  depth  perception,  he  can  (but  does  not  have  to)  make  use  of  the  head- 
tracked  stereoscopic  display  facilities.These  are  provided  by  the  StereoGraphics  CrystalEyesA^  product,  Tne 
CrystalEyes  stereo  glasses  enable  the  user  to  perceive  depth  by  fusing  two  alternately  displayed  images,  one  for 
each  eye. 

Another  depth  cue  can  be  obtained,  when  the  user’s  head  movements  are  tracked.  When  the  head  movements 
are  incorporated  in  the  viewpoint  of  the  rendering  module,  head  motion  parallax  of  the  rendered  images  can  be 
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obtained.  The  sensor  supported  by  the  SIEVE  system  is  the  Logitech  ultrasonic  head  tracker  This  device  is 
integrated  in  the  CrystaiEyes/VR  glasses.  The  motion  paralax  gives  an  extra  depth  cue,  that  si^ificantly 
LSesV,”  dTpUiVercepL.  BoJ,  *e  .i=wmg  and  te  head  Packing  can  be  used  .ogedie,  «,  ge. 

optimal  depth  perception,  but  they  can  be  used  separately  as  well  [7]. 


4.  RESULTS 

The  target  hardware  for  the  SIEVE  FOtotype  is  a  Silicon  Graphics  Onyx  workstation  equipped  with_^e 
Realitylngine^  graphics  option.  The  target  machine  has  two  CPU’s,  and  128  of  mmn  memory.  The 

graphics  subsystem  comprises  a  single  pipe  with  mulu-channel  option  (MCO).  The  performance  figures 
mentioned  below  were  obtained  on  this  system. 

The  use  of  the  SIEVE  system  is  best  Hlustrated  by  means  of  an  application.  Typical  example  applicauons  for 
the  SIEVE  prototype  are: 

1  Combined  use  of  DEM  and  muld-temporal  S  AR  data.  A  DLMS  elevation  data  set  is  combined  with 
multiple  SAR  images  of  the  same  area,  but  recorded  at  different  times. 

2.  Combined  use  of  SPOT  and  SAR:  This  application  allows  the  user  to  visualy  compare  the  contents  of 
optical  SPOT  imagery  with  ERS-1  radar  imagery. 

3.  Combined  use  of  ERS-SAR  and  JERS-SAR.  Combined  use  of  multi-frequency,  multi-temporal  data  is 
known  to  improve  classification  results. 

4.  Visualisation  of  multi-temporal  change  detection  and  optical  imagery.  In  this  demo  two  S^^R  .™agw, 
taken  a  year  apart,  are  used  to  detect  changes  in  backscatter.  The  difference  is  displayed  as  height.  This 
can  then  be  compared  with  either  one  of  the  SAR  images  or  with  the  SPOT  image. 

5  Visualisation  of  non-imaging  sensors.  Combined  visualization  of  sea  surface  height,  sea  surface 
'temperature,  mean  significant  wind  velocity.  Correlations,  if  any.  can  be  sought  for. 

For  performance  analysis  purposes,  two  implementations  of  application  2  were  made.  In  the  first  one,  all  im^e 
and  terrain  data  were  rendered  as  pure  geometry,  while  in  the  second  one,  the  images  were  texture  m  pped 
onto  the  geometric  terrain  model. 

In  the  first  trial  te  data  sets  all  had  a  resolution  of  200  x  300.  The  l^tToGeom  module  produc^ 
about  2  X  200  X  300  =  120K  triangles.  The  Render  module  was  able  to  render  this  geometry  at  a  ^e  rate  of 
approx.  4  Hz,  with  one  head  light  and  a  Phong  lighting  model.  This  means  we  have  obtained  a  ^ormance  of 
n^ly  500K  triangles  per  second,  whereas  the  manufacturer  claims  1.6M  mangles  per  second^beit  unlit  an 
flat-shaded  Thus^if  wVwant  to  obtain  a  25  Hz  frame  rate  in  this  case,  the  geometry  should  be  resmet^  te  at 
most  20K  triangles.  In  other  words,  the  Digital  Elevation  Map  should  contain  at  most  10,^  elem^ts  T^g 
the  aspect  ratio  of  a  standard  UTM  grid  (10  x  15  km)  into  account,  this  means  the  size  of  the  DEM  should  be 
sometog  like  80  x  120  pixels,  where  each  element  represents  an  area  of  125  x  125  m.,  which  is  actually  not 
too  far  of  the  original  resolution  of  the  DEM. 

During  a  second  trail,  the  texture  mapping  capabilities  of  the  hardware  were  used  The  AddTex^  modde 
enables  the  combination  of  DEMs  with  sateUite  images  with  different  rcsoluuons.  As  explained  above,  DEMs 
gefeSrha^e  a  resolution  of  approx.  100  x  100  m,  whereas  ERS-1  SAR  images,  for  have  a 

resolution  of  12.5  x  12.5  m.  Using  a  DEM  with  a  size  of  60  x  120  elements,  resulnng  in  over  14,000  triangles. 
Yielded  a  frame  rate  of  33.3  Hz,  independent  of  the  texture  size,  which  vaned  between  512  x  51^xels  up  to 
1024  X  2048  texels.  A  DEM  almost  twice  this  size  (80  x  160  elements)  resulting  in  over  25,0W 
yielded  a  frame  rate  of  16.7  Hz.  These  results  corroborate  our  previous  performance  estimates,  based  on  the 

results  of  the  first  trial. 

Stereo  display  was  not  used  during  the  performance  trials.  Stereo  viewing  of  co^  requires  two  images  te  be 
drawn,  one  for  each  eye.  This  means,  that  the  frame  rate  for  stereo  projecuon  will  be  halved. 

Figure  5  shows  an  image  of  application  2,  the  combination  of  optical  SPOT  witii  ERS-1  radar  data.  The  ar^ 
co^d  by  these  datasets  is  in  the  vicinity  of  Zwolle  in  the  Netherlands.  The  nver  shown  in  the  center  is  the 
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IJssei.  The  hills  in  the  left  part  of  the  image  are  the  foothills  of  the  Veluwe  range.  Note  that  terrain  height  is 
exaggerated.  The  use  of  the  stereoscopic  workstation  for  the  interactive  navigation  of  the  terrain  models  is 
shown  in  figure  6. 


Figure  5.  Perspective  rendering  of  optical  SPOT  and  ERS-1  SAP  images  on  a  3D  terrain  model 


Figure  6.  Stereoscopic  workstation  in  use. 


5.  CONCLUSIONS  AND  FURTHER  WORK 

We  have  described  a  system  for  the  interactive  stereoscopic  virusalisation  of  rempote  sensing  data.  A  number 
of  data  reduction  methods  are  employed  to  meet  the  requirements  for  real-time  rendering.  All  of  these  methods 
can  be  controlled  by  the  user.  This  allows  the  interactive  selection  of  an  optimum  trade-off  between  real-time 
(rendering  speed)  and  realism  (data  complexity)  in  a  given  applicadon. 

The  performance  measurements  described  in  secdon  4  were  obtained  whith  a  Phong  lighdng  model.  Later 
measurements  on  an  IRIS  Indigo  gave  a  performance  improvement  with  a  factor  of  two  when  we  switched  to  a 
lighdng  model  using  base  colors.  Although  it  is  dangerous  to  assume  the  same  performance  gain  on  a  different 
machine,  some  further  performance  improvement  may  be  expected  on  the  target  hardware  as  well.  Finally,  the 
geometry  decimadon  facilides  were  not  used  during  the  trials.  This  means  that  a  further  reducdon  of  die 
number  of  triangles,  and  thus  a  performance  increase  can  be  expected. 

Currendy,  SIEVE  renders  stereoscopic  image  pairs,  which  are  presented  to  the  user  through  liquid  crystal 
shutter  glasses.  This  method  currendy  provides  a  good  compromise  between  image  quality  and  a  sense  of 
immersion  through  the  stereoscopic  depth  illusion.  Eventually  we  would  like  to  be  able  to  use  a  head-mounted 
display  in  order  to  achieve  total  immersion  in  the  virtual  environment.  However,  the  requirements  put  by 
scientific  visualisation  applications  with  respect  to  resolution  and  field-of-view  are  currendy  beyond  what  is 
offered  by  available  HMD's. 

Extending  the  3D  presentadon  facilides  of  SIEVE  with  3D  manipuladon  capabOides  is  a  next  step  which 
requires  the  use  of  3D  interacdon  devices.  A  natural  candidate  for  this  would  be  the  Logitech  3D  mouse  device. 
This  allows  the  user  to  point  within  the  "virtual  holographic  display"  presented  in  front  of  the  monitor  and 
select  or  manipulate  objects.  The  addidon  of  facilides  for  the  automadc  selecdon  of  level-of-detail  and  region 
of  interest  would  further  improve  the  capabilides  of  IRIS  Explorer  to  deal  with  large  data  sets. 

The  SIEVE  system  is  sdll  under  development.  Therefore,  no  definite  conclusions  can  be  drawn  at  this  dme. 
However,  because  representatives  from  the  remote  sensing  community  are  closely  involved  in  the  project, 
feel  confident  that  the  result  of  the  development  effort  will  be  a  useful  tool  for  them. 


BIBLIOGRAPHY 

[1]  B.  H.  McCormick,  T.  A.  DeFanti,  and  M.  D.  Brown,  eds.  "Visualization  in  Scientific  Computing,” 
Computer  Graphics,  Vol.  21,  No.  6,  Nov.  1987. 

[2]  C.  Upson,  et.al.,  "The  Application  Visualization  System:  A  Computational  Environment  for  Scientific 
Visualization,"  TKF.F.  Computer  Graphics  and  Applications,  Vol.  9,  No.  7,  1989. 

[3]  IRIS  Explorer  User’s  Guide,  Document  Number  007-1371-010,  Silicon  Graphics  Computer  Systems, 
Mountain  View,  CA.,  1993. 

[4]  A.  F.  Hasler,  K.  Palaniappan,  and  M.  Manyin,  "A  high  performance  Interactive  Image  Spreadsheet 
(nSS),"  Computers  In  Physics,  Vol.  8,  No.  3,  May/Jun.  1994. 

[5]  Micheal  W.  McGreevy,  "Virtual  Reality  and  Planetary  Exploration,"  in:  Alan  Wexelblat,  "Virtual  Reality; 
Applications  and  Explora-tions,"  Academic  Press,  Cambridge,  MA.,  1993. 


6 


[6]  W.  J.  Schroeder,  J.  A.  Zarge  and  W.  E.  Lorenson,  "Decimation  of  Triangle  Meshes",  ACM  Computer 
Graphics,  Vol.  26,  Nr.  2,  (Proc.  Siggraph),  July  1992. 

[7]  Micheal  Deering,  "High  Resolution  Virtual  Reality",  ACM  Computer  Graphics  (Proc.  SIGGRAPH  ’92), 
Vol.  26,  No.  2,  Jul.  1992. 


ACKNOWLEDGEMENTS 

-n.e  work  d^cribed  h«e  waspetoed  under  ESAffSTEC  cod^c.  10475«WG<^),  and  also  sponsced 

by  the  Netherlands  Agency  for  Aerospace  Programs  (NIVR)  under  contract  NRT  2305  FE, 


7 


I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

i 

I 

I 

I 


Further  Development 
of  the 

Responsive  Workbench 

Bernd  Frohlich  Bertiiold  Kirsch 
Wolfgang  Kriiger  Gerold  Wesche 

Dept,  of  Visualization  and  Media  Systems  Design 
German  National  Research  Center  for  Computer  Science 
Sankt  Augustin,  Germany 

E-mail:  bemd.froehliclvSgmd.de 

January  18,  1995 


Abstract 

The  Responsive  Workbench  [8]  is  designed  to  support  end  users  scientists, 
engineers,  physicians,  and  architects  working  on  desks,  workbenches,  and  tables 
with  an  adequate  human-machine  interface.  Virtual  objects  are  located  on  a  real 
“workbench”.  The  objects,  displayed  as  computer  generated  stereoscopic  images 
are  projected  onto  the  surface  of  a  table.  The  participants  operate  within  a  non- 
immersive  virtual  environment.  A  “guide”  uses  the  virtual  environment  while  several 
observers  can  watch  events  by  using  shutter  glasses.  Depending  on  the  application, 
various  input  and  output  modules  have  been  integrated,  such  as  motion,  gesture 
and  voice  recognition  systems  which  characterize  the  general  trend  away  from  the 
classical  multimedia  desktop  interface. 

The  svstem  is  explained  and  evaluated  in  several  applications:  A  virtual  patient 
serves  as  an  example  for  non- sequential  medical  training.  The  car  industry  benefits 
from  areas  like  rapid  prototyping  for  exterior  design  and  interactive  visualization 
and  examination  of  flow  field  simulations  (virtual  windtunnel,  mixing  processes). 
Visualization  and  verification  of  experiments  with  mobile  instrument  deployment 
devices  in  outer  space  missions  are  another  fascinating  application.  Architecture  and 
landscape  design  are  another  discipline  well  suited  for  the  workbench  environment. 
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1  Motivation 

The  standard  metaphor  for  human-computer  interaction  arose  from  the  daily  experience  of 
a  white-collar  office  worker.  For  the  last  20  years  desktop  systems  have  been  enhanced  more 
and  more,  providing  tools  such  as  line  and  raster  graphics,  WIMP  (Window  Icon  Mouse 
Pointer)  graphical  user  interfaces  and  advanced  multimedia  extensions.  With  the  advent 
of  immersive  virtual  environments  the  user  finally  arrived  in  a  3D  space.  Walkthrough 
experiences,  manipulation  of  virtual  objects,  and  meetings  with  synthesized  collaborators 
have  been  proposed  as  special  human-computer  interfaces  for  the  scientific  visualization 
process.  Specific  interfaces,  originally  developed  for  pilots  and  telepresence  tasks,  became 
available  to  the  ordinary  user  (see  [7],  for  example). 

The  dream  of  the  ultimate  medium,  which  uses  all  channels  of  human  perception, 
has  guided  the  efforts  of  user  interface  design  towards  these  virtual  reality  systems.  Un¬ 
fortunately,  head-mounted  displays,  body-tracking  suits,  and  force-feedback  exoskeletons 
are  obstrusive.  These  systems  separate  the  users  from  each  other.  Especially  in  scientinc 
visualization  applications,  comprehensive  attempts  have  been  made  to  overcome  these 
drawbacks.  The  BOOM  systems  allow  for  easy-to-use  walkthrough  and  object  manipu¬ 
lation  experiences  [3].  The  surround-screen  projection-based  virtual  environment  CAVE 
[2]  was  designed  for  several  users  to  become  immersed  with  their  whole  body  in  a  virtual 
space. 

All  these  approaches  to  future  user  interface  systems  have  one  point  in  common;  design 
of  an  (almost)  universal  interface  based  on  the  most  advanced  computer  and  display  tech¬ 
nology  available. 

Another  approach  to  the  design  problem  for  future  human-computer  interfaces  is  ri¬ 
gorously  centered  on  the  users ’s  point  of  view.  Myron  Krueger  pioneered  this  attempt 
with  his  work  on  non-immersive  responsive  environments  [7].  Application-oriented  visua¬ 
lization  environments  have  been  proposed  and  built  to  support  a  specific  problem-solving 
process.  The  computer  acts  as  an  intelligent  server  in  the  background  providing  necessary 
information  across  multi-sensory  interaction  channels  (see  [4],  [10],  for  example). 

We  developed  the  Responsive  Workbench  concept,  first  described  in  [8],  as  an  alterna¬ 
tive  model  to  the  multimedia  and  virtual  reality  systems  of  the  past  decade.  Analyzing  the 
daily  working  situation  of  such  different  computer  users  as  scientists,  architects,  pilots, 
physicians,  and  professional  people  in  travel  agencies  and  at  ticket  counters,  we  recogni¬ 
zed  that  there  is  only  small  acceptance  of  a  simulation  of  working  worlds  in  a  desktop 
environment.  Generally,  users  want  to  focus  on  their  tasks  rather  than  on  operating  the 
computer.  Future  computer  systems  should  use  and  adapt  to  the  rich  human  living  and 
working  environments,  becoming  part  of  a  responsive  environment. 

2  System  description 

During  the  analysis  of  the  working  environment  and  of  the  behaviour  of  the  specialists, 
we  recognized  that  the  (cooperative)  tasks  of  this  class  of  users  relies  on  a  “workbench” 
scenario.  The  future  impact  of  desk-like  user  interfaces  in  general  has  been  discussed  in 
[9].  Using  a  beamer,  a  large  mirror  and  a  special  glass  plate  as  table  top,  we  built  an 
appropriate  virtual  environment. 
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Figure  1:  Set-up  for  a  stereoscopic  display  of  virtual  objects  on  a  desk 


Virtual  objects  and  control  tools  are  located  on  a  real  “workbench  (see  Figure  1).  The 
)biects  displayed  as  computer  generated  stereoscopic  images,  are  projected  onto  the  sur- 
•ace  of  the  workbench.  The  projection  parameters  are  tuned  such  that  the  virtual  objects 
ippear  above  the  table.  Depending  on  the  application,  vanous  input  and  output  modules 
-an  be  integrated,  such  as  motion,  gesture  and  speech  recogmtion  systems.  A  responsive 
environment,  consisting  of  powerful  graphics  workstations,  tracking  systems,  cameras, 
projectors  and  microphones,  replaces  the  traditional  multimedia  desktop  workstation 
The  most  important  and  natural  manipulation  tool  for  virtual  environments  is  the 
user’s  hand.  Our  environment  depends  on  the  real  hand,  not  a  computer-generated  re¬ 
presentation.  The  user  wears  a  data  glove  with  a  Polhemus  sensor  mounted  on  the  back. 
Lsture  recogmtion  and  collision  detection  algorithms,  based  on  glove  and  Polhemus  data, 
compute  the  user’s  interaction  with  the  virtual  world  objects. 

To  get  correct  stereoscopic  rendering  from  any  location  around  the  workbenc 
system  must  keep  track  of  the  guide’s  eye  positions.  We^lized_  this  by  mounting  a 
Polhemus  sensor  on  the  side  of  the  shutter  glasses.  It  deliver  position  and 
data  for  the  head,  allowing  the  system  to  calcula^te  the  position  of  each 
coUaborateurs  see  the  stereoscopic  images  with  only  shght  distortions  as  long  as  they  stay 

The  Res^tL  Workbench  setup  generates  a  very  effective  3D  impression  which  is 
due  to  the  negative  parallax,  the  wide  angle  of  view  and  the  head  tracking.  None  of  the 
users  suffered  from  motion  sickness  using  the  workbench  which  happens  often 
mounted  displays.  This  seems  due  to  the  non-immersive  nature  of  our  approach.  People 
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Figure  2:  Cooperative  work  of  a  physician  and  a  student 
still  have  fix  points  in  their  environment  so  their  senses  don’t  get  irritated. 

3  Applications 

Based  on  current  research,  projects  in  the  field  of  computer  graphics,  human  computer 
interfaces  and  visualization,  the  following  applications  have  been  embedded  in  this  new 
type  of  environment  following  the  suggestions  of  the  involved  end  users. 

3  A  Medicine 

3.1.1  Nonsequential  training 

This  scenario  is  based  on  a  real  sized  model  of  a  patient.  Figure  2  shows  the  model 
in  a  teacher/student  scenario.  The  patient’s  skin  can  become  transparent,  making  the 
arrangement  of  the  bones  visible.  Now  the  surgeon  or  student  can  pick  up  a  bone  with 
the  data  glove  and  examine  its  joints,  or  take  a  closer  look  at  the  bone  itself.  The  virtual 
patient  could  be  examined  in  any  detail  through  the  zoom  operation.  Covered  parts  could 
be  set  free  by  removing  the  obscuring  bones  or  organs  with  the  hand  or  by  making  them 
transparent.  Especially  important  for  the  understanding  of  many  processes  inside  the 
human  body  are  their  dynamic  aspects.  We  implemented  two  primary  cases:  the  spatially 
exact  reconstruction  of  the  beating  heart  and  the  blood  flow  inside  the  transparent  heart. 


3.1.2  Simulation  system  for  ultrasound  heart  examinations 

This  research  project  has  been  developed  in  close  cooperation  with  the  Center  for  Pediatry 
of  the  University  of  Bonn,  Department  for  Cardiology,  Germany.  A  typical  user  team  is 
made  from  a  radiologist,  a  surgeon  and  a  visualization  specialist. 

Originally,  the  project  was  designed  on  a  multimedia  workstation.  Recently  we  started 
to  implement  the  system  on  the  Responsive  Workbench  to  meet  the  requirements  of 
the  surgeons  for  a  virtual  environment.  They  want  to  see  the  organ  of  interest  and  the 
measurement  process  in  real  or  magnified  size  from  all  points  of  view  in  3D  space.  They 
also  would  hke  to  compare  the  simulation  with  the  images  on  TV  screens  originating  from 

the  scanning  process.  ,  ,  .  .  x-  •  j.- 

Detailed  visualizations  of  the  beating  heart  can  be  explored  as  interactive  ammations. 

The  user  can  rotate  the  model  in  order  to  examine  the  structural  and  dynaimc  features  of 
the  heart.  Different  visualization  modes  (i.e.,  transparent,  with/without  blood  circidation) 
are  available.  The  complex  interior  structures  and  dynamics  of  the  heart,  valves,  and  blood 
can  thus  be  examined  (see  Figure  3). 


Figure  3:  Examination  of  the  blood  flow  in  a  human  heart 


3.2  Architecture  and  design 

For  the  design  and  discussion  process  in  architecture,  landscape  and  environmental  plan¬ 
ning  we  implemented  a  basic  testbed  for  demonstrations. 

An  architectural  model  is  shown  on  the  workbench,  in  our  case  the  area  around  the 
buildings  of  our  research  institute.  In  front  of  the  table  two  architects  discuss  the  model, 
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Figure  4:  Virtual  windtunnel  scenario  for  car  manufacturing  applications 
(aerodynamical  study  model  ASMO-II). 

moving  around  buildings  or  other  objects,  such  as  trees  in  the  virtual  world.  Additionally, 
lightsources  can  be  set  by  the  data  glove  to  simulate  different  times  of  the  day. 

3*3  Automotive  industry 

In  cooperation  with  scientists  and  engineers  of  the  research  department  of  Daimler-Benz 
AG,  Stuttgart,  we  implemented  two  applications  concerned  with  fluid  dynamic  simulations 
on  supercomputers. 

3.3.1  Virtual  windtunnel 

This  application  realizes  the  virtual  windtunnel  scenario  [1]  (see  Figure  4)  in  the  Re¬ 
sponsive  Workbench  setting.  The  simulation  data  is  taken  from  a  finite  element  program 
running  on  a  supercomputer  or  a  highend  workstation.  In  a  preprocess  the  data  points 
from  the  finite  element  mesh  are  resampled  to  a  regular  grid  to  speed  up  particle  tracing. 
Particle  tracing  directly  on  finite  element  meshes  is  more  accurate,  but  the  additional  com¬ 
putational  cost  restrict  the  number  of  particles,  which  could  be  handled  simultaneously. 
The  geometry  data  is  also  extracted  from  the  finite  element  mesh  and  somewhat  polished 
by  a  modeling  system,  e.g.  by  adding  textures.  A  few  precomputed  streamlines  are  added 
as  an  overview  of  the  flow  field. 

The  stylus  serves  as  a  particle  injector  to  examine  any  area  around  the  car  in  detail. 
The  particle  generation  rate  and  their  lifetimes  are  adjustable.  The  velocity  values  of  the 
fiowfield  are  globally  scalable  even  if  this  is  physically  not  realistic. 
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3.3.2  Mixing  process 


The  dynamics  of  the  mixing  process,  generated  by  a  supercomputer  simulation,  are  visuali¬ 
zed  with  the  aid  of  fluid  particles  as  rendering  primitives.  The  essential  physical  properties 
to  be  visualized  are  the  velocity  field,  pressure,  temperature  and  fuel  distribution.  The 
mixing  process  is  strongly  time-dependent,  so  the  data  rate  is  much  higher.  The  visuali¬ 
zation  shows  the  particle  flow  with  color  coded  temperature  during  the  injection  process. 
These  particle  paths  are  precomputed  during  the  finite  volume  simulation.  The  current 
implementation  focusses  on  the  interactive  real-time  exploration  of  the  temperature  and 
pressure  distribution  inside  the  cylinder  with  arbitrary  cutting  planes.  The  cutting  plane 
is  attached  to  the  stylus  which  allows  easy  positioning.  The  finite  element  data  is  again 
converted  to  a  regular  grid,  which  serves  as  a  3D  texture  on  the  SGI  Heality  Engine  2 
rendering  system. 


Simulation  and  control  of  outer  space  experiments 


In  cooperation  with  Deutsche  Forschungsanstalt  fur  Luft-  und  Raumfahrt  e.V.  (DLR)  and 
other  partners  a  mobile  instrument  deployment  device  prototype  (IDD)  will  be  developed. 


Figure  5:  IDD  TEMl  implemented  by  DLR  and  Uni  Duisburg 


A  mobile  IDD  is  a  small  microroboter  for  positioning  of  instruments  on  Mars  or  other 
space  bodies  to  explore  the  near  vicinity  of  the  landing  location.  It  is  not  possible  to 
test  the  IDD  under  maxtian  conditions  or  to  control  it  on  Mars  directly.  The  first  project 
stage  studies  the  possible  walking  styles  of  an  IDD  and  identifies  the  necessary  data  for  a 
precise  simulation  of  its  behaviotir.  In  a  later  stage  the  Responsive  Workbench  is  meant 
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to  display  remotely  sensed  terrain  data  including  the  position  of  the  IDD  and  the  lander 
for  simulation  and  planning  of  experiments. 

An  IDD  prototype  vehicle  hcLS  been  developed  by  Transmash,  St.  Petersburg.  It  con¬ 
sists  of  three  container  segments  which  are  coupled  by  two  traverses.  In  its  smallest  posi¬ 
tion  the  size  of  the  IDD  is  about  35x20x7  cm.  The  IDD  moves  by  rotation  of  the  container 
segments  which  hold  the  instruments.  The  IDD  has  been  further  developed  by  DLR  and 
the  University  of  Duisburg  (see  Figure  5).  Dynamics  and  kinematics  are  simulated  using 
"^MOBILE”  [5],  a  multibody  modeling  system. 

A  computer  controlled  crawling  or  walking  style  can  be  developed  in  the  Responsive 
Workbench  environment.  The  main  problems  are:  which  moving  styles  are  possible,  which 
information  (input  sensors)  is  needed  to  control  speed,  direction  and  walking  style  or  to 
program  autonomous  movement  (reaction  on  obstacles,  keep  a  given  direction  etc.)  of  the 
IDD  robot  [6]. 

Following  the  successful  simulation  of  a  save  walking  (crawling)  path  in  the  virtual 
environment  at  the  ground  station,  the  driving  code  is  sent  to  the  IDD  operating  on  an 
other  planet.  Data  measured  by  the  IDD  and  the  lander  will  e  sent  back  to  calculate  the 
next  steps  and  to  update  the  visualization.  This  control  loop  is  necessary  to  syncronize 
the  remote  and  the  virtual  environment. 

Typical  operating  sequences  for  an  IDD  are  the  approach  to  a  preselected  site,  ap¬ 
propriate  positioning  of  the  instruments  at  the  object,  preparation  of  the  object  for  mea¬ 
surements,  measurement  procedure,  aquisation  of  a  surface  sample  and  analysis,  digging 
to  acquire  a  sub-surface  sample  and  analyse  it,  return  material  to  the  lander  for  further 
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analysis,  provide  additional  information  for  the  selection  of  the  next  site. 

The  operating  sequences  are  prepared  and  tested  in  the  virtual  environment.  The 
lander  station  sends  its  data  to  the  ground  station.  These  data  is  used  to  construct  the 
actual  virtual  world  where  the  scientist  acts.  The  scientist  decides  on  the  next  action, 
teaches  the  new  goal  by  i.e.  pointing  to  the  target  site  and  runs  the  experiment  within 
the  virtual  world.  If  the  experiment  has  been  successful  the  appropriate  commands  are 
sent  to  IDD  on  the  planet.  When  the  new  situation  on  the  planet  has  been  incorporated 
into  the  virtual  world  the  next  sequence  can  start. 

4  Conclusions 

The  Responsive  Workbench  system  is  designed  to  demonstrate  the  ideas  and  power  of  fu¬ 
ture  cooperative  responsive  environments.  Further  applications  under  consideration  run¬ 
ning  on  this  virtual  workbench  will  be  the  simulation  of  air  and  ground  traffic  on  airports, 
a  training  environment  for  complicated  mechanical  tasks,  e.g.,  taking  apart  a  machine  for 
repair,  landscape  design  and  environmental  studies  via  terrain  modeling,  and  physically 
based  modeling  of  virtual  objects  (“virtual  clay”).  These  applications  also  rely  on  the 
workbench  metaphor,  but  require  specific  interaction  and  1/ 0  tools. 
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